How to Obfuscate Integer IDs

May 10, 2009

Abstract

Many web sites and applications expose information about themselves by using plain integers as IDs. URIs often identify entities like pages, content elements, and even users in the form http://example.com/content.php?page=42. For example:

Facebook Member URI

Using simple integers may expose certain information about a system, such as how many items a site has in its catalog or how quickly the site’s membership is growing. For various reasons, you might not want to reveal this information.

Integer IDs also make it trivial to probe a system with guessed IDs. While this isn’t necessarily a security risk (or even universally undesirable), you may wish to prevent this kind of probing of your system. Good obfuscation makes it effectively impossible to retrieve arbitrary URIs by simply guessing numeric IDs. Obfuscation is not intended to be a security feature, but it can add an element of detection and protection against casual probing.

This article will present the IdObfuscator class, which provides the static methods ::encode() and ::decode() to convert numeric IDs between integers and encrypted strings. The encrypted strings are tamper-resistant, so they can’t be altered without detection.

The IdObfuscator Class

While you could just store hashes in your database alongside integer IDs, this is unnecessary and inefficient in terms of both storage and indexing. Generally it is preferable to store IDs as unsigned integers and have algorithms to encode and decode them. This is what the IdObfuscator class does.

IdObfuscator was born of a project with requirements to hide the size of the site’s membership and its growth rate—the kinds of things startups often prefer not to to disclose. The specific requirements were to:

  • hide information about the application by obfuscating all integer IDs
  • not have to store and index the encoded form
  • be able to detect if the encoded form is altered

Given these requirements and their intent, the solution should:

  • use real hashing, not just reversible scrambling
  • not rely on the algorithm being hidden (safe for visible-source languages, like PHP)
  • encode and decode the IDs quickly
  • not create encoded forms in any recognizable pattern
  • always return the same value for a given ID and salt
  • if possible, use functionality native to the language (not dependent on external libraries), and preferably, common to many languages

There are quite a lot of “solutions” to this problem floating around, but most rely on simple scrambling and other easily-reversible techniques. Few are based on one-way hashing. None I found met all of the above requirements, so I rolled my own. I experimented with a few different algorithms and finally settled on the one demonstrated here for its relative speed and simplicity.

IdObfuscator satisfied all of the above requirements. It has proved so useful that I incorporated its functionality into the DAO base class of the Spawn PHP Framework, so all its DAOs can use it via the ->getIdx() and ::getObjectByIdx() methods.

Live Demo

The Encoding Algorithm

IdObfuscator expects a constant called CRYPT_SALT1 to be defined before its methods are called.

IdObfuscator’s encoding algorithm is based on “burying” the value of $id within a random number. The value of $id is buried by subtracting it from (or adding it to) a random number so that the random number differs from its expected value by the value of $id. Thus, the value of $id can be calculated if you know what the random number should be; i.e., $id = the difference between the random number’s actual value and its expected value.

You would be correct if you guessed that, for this to work, the random number can’t be truly random; it must be effectively random, yet still predictable (by the code, not the client). Hashes like MD5 and SHA work well for this, so I used PHP’s sha1() function, and in all cases, appended a salt string to the value being hashed. Obfuscated IDs cannot be decoded without knowing the salt with which they were encrypted.

Here’s the algorithm in a nutshell:

  1. Create a random number ($segment1) based on a hash of $id.
  2. Create a second random number ($segment2) based on a hash of $segment1.
  3. Alter $segment2 by adding or subtracting the value of $id.
  4. Make a third hash ($segment3) from $segment1 and the altered $segment2. This hash makes it possible to detect any alteration of the encoded ID.
  5. Concatenate the three segments into a string, and voilà – you have your obfuscated ID.

To decode it, just calculate the difference between value of $segment2 and its expected value. The key is that the values of all segments are predictable only if you know the encryption salt.

Here’s the algorithm in more detail:

$segment1

First, create a hash from $id.2 PHP’s sha1() function returns a 40-character hexadecimal string, but for this segment we’re only going to use the first 16 characters of it.

$segment2

Next, create a second hash segment based on $segment1. $segment2 is the first eight characters of sha1($segment1.CRYPT_SALT). The purpose of this segment is to provide the random number into which $id will be buried.

Convert $segment2 from base 16 to base 10 ($dec is the decimal value of $segment2). The value of $dec will be a number between 0 and 4,294,967,295 (2^32-1) that is basically random, but predictable based on $segment1 if you know the value of CRYPT_SALT.

Bury $id in $dec by altering $dec by the value of $id. Since $dec is usually greater than $id, the alteration is usually $dec-$id; if the reverse is true, it’s $dec+$id. Thus, the value of $id can be learned by calculating the absolute difference between the expected value of $segment2 and its new value (after $id was subtracted or added).

The range of integers that can be safely buried in $segment2 is half of its maximum value, or 2,147,483,647 (2^31-1). Above that limit, there’s a risk (small at first, but increasing toward certainty as $id gets bigger) of overflowing 32-bit integers, which PHP will silently and happily do, and which will wreck the decoding process.

$segment3

Lastly, we compute an eight-character hash of the combined $segment1 and $segment2. This acts as a checksum, so if any character in any segment is changed, decoding will fail and ::decode() will return 0.

Squishing it Down

Together the three segments form a 32-digit hex string with $id buried in characters 16-23:

ef02550d40b359f5e4f14e52b2089761

While this string would be perfectly usable, it’s longer than necessary because base 16 isn’t the most efficient use of characters. I decided to shorten it a little by converting it to base 64. PHP’s base_convert() can’t handle 32-digit hex numbers, nor can it do base-64 conversion (it’s limited to base 36). So I decided to pack() the hex string into a 16-character binary string, then base64_encode() that, leaving a string that is (almost always) 22 characters long, about 30% shorter:

7wJVDUCzWfXk8U5SsgiXYQ

I figured I could live with that.

Character Fixing

Since the encoded IDs will be passed around in links, I decided to replace the / and + characters traditionally used in base-64 encoding, because those characters have other uses in URIs. I chose to replace them with : and $, but those are pretty arbitrary. Some other modified base64 variants use - and _. (I also chopped off any trailing =, which PHP handles without complaint.)

Source Code

<?php
class IdObfuscator {

	public static function encode($id) {
		if (!is_numeric($id) or $id < 1) {return FALSE;}
		$id = (int)$id;
		if ($id > pow(2,31)) {return FALSE;}
		$segment1 = self::getHash($id,16);
		$segment2 = self::getHash($segment1,8);
		$dec      = (int)base_convert($segment2,16,10);
		$dec      = ($dec>$id)?$dec-$id:$dec+$id;
		$segment2 = base_convert($dec,10,16);
		$segment2 = str_pad($segment2,8,'0',STR_PAD_LEFT);
		$segment3 = self::getHash($segment1.$segment2,8);
		$hex      = $segment1.$segment2.$segment3;
		$bin      = pack('H*',$hex);
		$oid      = base64_encode($bin);
		$oid      = str_replace(array('+','/','='),array('$',':',''),$oid);
		return $oid;
	}

	public static function decode($oid) {
		if (!preg_match('/^[A-Z0-9\:\$]{21,23}$/i',$oid)) {return 0;}
		$oid      = str_replace(array('$',':'),array('+','/'),$oid);
		$bin      = base64_decode($oid);
		$hex      = unpack('H*',$bin); $hex = $hex[1];
		if (!preg_match('/^[0-9a-f]{32}$/',$hex)) {return 0;}
		$segment1 = substr($hex,0,16);
		$segment2 = substr($hex,16,8);
		$segment3 = substr($hex,24,8);
		$exp2     = self::getHash($segment1,8);
		$exp3     = self::getHash($segment1.$segment2,8);
		if ($segment3 != $exp3) {return 0;}
		$v1       = (int)base_convert($segment2,16,10);
		$v2       = (int)base_convert($exp2,16,10);
		$id       = abs($v1-$v2);
		return $id;
	}

	private static function getHash($str,$len) {
		return substr(sha1($str.CRYPT_SALT),0,$len);
	}
}
?>

Note: The (int) casts are there to fix a bug with big integers in some PHP builds.

Limitations

  • Input values are limited to the range of 1 through 2,147,483,647 (2^31-1)
  • You shouldn’t change CRYPT_SALT after the system begins producing live production data if any values encoded with CRYPT_SALT are stored anywhere, including cookies. (This applies not only to IdObfuscator, but to anything that uses CRYPT_SALT.) Even if encoded values aren’t stored anywhere, it’s strongly recommended that you not change CRYPT_SALT, because doing so would change all URIs that contain encoded IDs and wipe out any search-engine ranking your pages may have.
  • This algorithm is intended for use on the web. It would lose some of its value in a system where the salt was delivered with the application, such as an installable desktop application.

Footnotes

1 CRYPT_SALT can be anything you like, but it should be (a) complex enough not to be guessable (like a password, which it effectively is), and (b) unique to each site on which it is used. It also can’t safely be changed once a site starts producing real live production data, since encoded IDs may persist in various places, like cookies and URIs. The Spawn PHP Framework defines CRYPT_SALT in host-config.php.

2 All hashes in IdObfuscator are created by the private ::getHash() method, which concatenates the subject with CRYPT_SALT.

Spawn PHP Framework

Learn all about installing and working with this easy-to-use MVC framework.

Continue reading »

Coming Soon: Scaveng

Scaveng is a scriptable web scraper. Stay tuned for more.