IPv6 SLAAC Crime Attribution

The need for individual right to privacy and the need for law enforcement to be able to effectively investigate crime are sometimes portrayed as being irreconcilably in direct conflict with each other. Both needs are legitimate and ignoring the challenges presented by areas of conflict will not make the problem go away.

My recently published Internet Draft presents a conceptual model that allows for both sets of requirements to be met simultaneously. The reason for this publication is to show that, with some creative thinking, it is possible to identify win-win solutions that simultaneously achieve both privacy and law enforcement goals. This post contains a summary of the main ideas presented in that paper.

Current regulatiory regimes typically oblige ISPs to keep records to facilitate identification of subscribers if necessary for a criminal investigation and in the case of IPv6 this will mean recording the prefix(es) have been assigned to each customer. IPv6 addresses are assigned to organisations in blocks that are much larger than the size of the blocks in which IPv4 addresses are assigned, with common IPv6 prefix sizes being /48, /56 and /64.

From the perspective of crime attribution, therefore, when a specific IP address is suspected to be associated with criminal activity, records will most likely available from an ISP to identify the organisation to which the prefix has been assigned. The question then arises how an organisation approached by law enforcement authorities, particularly a large organisation, would be able to ascertain which host/endpoint within their network was using a particular IP address at a particular time.

This is not a new problem, with many difficulties of crime attribution already present in the IPv4 Internet.

IPv6 Stateless Address Autoconfiguration (SLAAC) describes the process used by a host in deciding how to auto configure its interfaces in IPv6. This includes generating a link-local address, generating global addresses via stateless address autoconfiguration and then using duplicate address detection to verify the uniqueness of the addresses on the link. SLAAC requires no manual configuration of hosts, minimal (if any) configuration of routers, and no additional servers.

Originally, various standards specified that the interface identifier should be generated from the link-layer address of the interface (for example RFC2467, RFC2470, RFC2491, RFC2492, RFC2497, RFC2590, RFC4338, RFC4391, RFC5072, RFC5121). RFC7217 (A method for generating semantically opaque interface identifiers with IPv6 stateless address auto configuration (SLAAC)) describes the currently recommended method whereby an IPv6 address configured using the method is stable within each subnet, but the corresponding interface identifier changes when the host moves from one network to another.

In general terms, the approach is to pass the following values to a cryptographic hash function (such as SHA1 or SHA256):

  • The network prefix
  • The network interface id
  • The network id (subnet, SSID or similar) – optional parameter
  • A duplicate address detection counter – incremented in case of a duplicate address being generated
  • A secret key (128 bits long at least)

The interface identifier is generated by taking as many bits, starting at the least significant, as required. The result is an opaque bit stream that can be used as the interface id.

On the other hand, RFC4941 (Privacy Extensions for Stateless Address Autoconfiguration in IPv6) describes a system by which interface identifiers generated from an IEEE identifier (EUI-64) can be changed over time, even in cases where the interface contains an embedded IEEE identifier. These are referred to as temporary addresses. The reason behind development of this technique is that the use of a globally unique, non-changing, interface identifier means that the activity of a specific interface can be tracked even if the network prefix changes. The use of a fixed identifier in multiple contexts allows correlation of seemingly unrelated activity using the identifier.  Contrast this with IPv4 addresses, where if a person changes to a different network their entire IP address will change.

To prevent the generation of predictable values, the algorithm must contain an cryptographic component.  The algorithm assumes that each interface maintains an associated randomised interface identifier. When temporary addresses are generated, the current value of the interface identifier is used.  

From the crime attribution perspective, both the recommended stable and temporary address generation algorithms pseudo-randomly select addresses from the space of available addresses. When SLAAC is being used, the hosts auto-configure the IP addresses of their interfaces, meaning there is no organisational record of the IP addresses that have been selected by particular hosts at particular points in time.

My Internet Draft presents a record-retention model whereby it is possible for an organisation, if required to do so as part of a criminal investigation, to answer the question “Who was using IP address A at a particular point in time?” without being able to answer any more broadly scoped questions, such as “What were all of the IP addresses used by a particular person?”

The model described  assumes that the endpoint/interface for which the IPv6 address is being generated has a meaningful, unique identifying characteristic. Whether that is the layer two address of the interface or some other organisational characteristic is unimportant for the purpose of the model.

The host generates an IPv6 address using any of the techniques described above, but most likely the technique described in RFC4941. Having completed the duplicate address detection phase of SLAAC but before beginning to use the IP address for communication, the host creates a structure of the following form:

 

typedef struct {
   const char *LOG_ENTRY_TAG=”__LOG_ENTRY_TAG__”;
   unsigned char *ip_address;
   unsigned int identifying_characteristic_length;
   unsigned char *identifying_characteristic;
   unsigned int client_generation_time;
   unsigned int client_preferred_time;
   unsigned int client_valid_time;
} log_entry;

The fields are all mandatory, and populated as follows:

  • LOG_ENTRY_TAG has the fixed, constant value “__LOG_ENTRY_TAG__”
  • ip_address contains the 16 byte IPv6 address.
  • identifying_characteristic_length contains the byte length of the identifying_characteristic field.
  • identifying_characteristic is a variable length byte string, organisationally interpreted, to represent the identifying characteristic of the host generating the IPv6 address.
  • client_generation_time contains the time, in seconds since the unix epoch, as recorded by the client creating the IPv6 address, at which the address was generated.
  • client_preferred_time contains the period, in seconds, starting at client_generation_time for which the client will use this IPv6 address as its preferred address.
  • client_valid_time contains the period, in seconds, starting at client_generation_time for which the client will consider this IPv6 address to be valud.

When the structure has been populated, the host encrypts the structure using AES-128 in CBC mode with the selected IPv6 address being used as the encryption key. The host then submits the record above to a specified multicast address and port but, when sending the record, sends it using the unspecified IPv6 address (i.e. “::”) as the source IP address. When records are received by the logging server, listening to the specified multicast address, the logging server creates a new log entry consisting of:

  • The time the record was received, ideally calibrated to a global standard time (e.g. NTP) with the granularity of a second.
  • The encrypted record received as a binary blob.

If and when it becomes necessary to query the recorded entries, the following (representative) process can be followed:

  1. Taking the IP address for which the attribution information is required, iterate through all recorded log entries and use the IP address as a decryption key and attempt to decrypt the record.
  2. Examine the decrypted data and check whether the first 17 bytes have the values “__LOG_ENTRY_TAG__”.
    • If so:
      1. This indicates that the log entry has been successfully decrypted.
      2. The IP address contained in the log entry can be verified against the IP address that was used as a key to confirm that the log entry contains the correct value.
      3. The identifying characteristic can then be read from the log entry, along with the time at which the host generated the IP address.
      4. The time in the record can be correlated with the time in the log entry recorded by the server so that any time differential can be compensated for.
    • If not:
      1. This indicates that the log entry has not been successfully decrypted and that the current log entry pertains to a different IP address.
      2. Move on to the next log entry and try again.

It would be computationally feasible to use this process on a large number of log entries but, if necessary, the number of log entries can be reduced by selecting a range of log entries based on the time recorded by the server.

In order to decrypt a specific log entry without knowing the target IP address, a brute force approach must be adopted. Presuming a known 64-bit address prefix, means that there is a space of 2^64 possible addresses to search for each individual log entry.

The privacy of the records comes from the pseudo-random nature of the IPv6 address generation mechanism, the very feature that is desirable from a privacy perspective.

The model presented here provides a balance between the needs for individual privacy at the network layer while also providing a mechanism for recording data that would be required in a criminal investigation. The balance that has been proposed here is at the point where it is possible to identify, using this technique, who was using a specific IP address at a specific point in time without being able to extract any more information such as all of the people who were using a particular IP or all of the IP addresses that were used by a particular endpoint.

Leave a comment

Make sure you enter all the required information, indicated by an asterisk (*). HTML code is not allowed.

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site you are accepting the use of cookies in accordance with our privacy policy.
Privacy Policy Accept