How to Validate Objects in PHP
May 12, 2009
(Note: This article is incomplete – I’m just using it to work out some kinks with the new site tools)
One of the most tedious aspects of large projects is handling validation on all the objects they use.
Validation of user input is important to all software, but it has some special security implications for web-based software. Unlike desktop applications, web apps open your hardware, software and data to any user, anywhere in the world. And a few of those users have the inclination and time to spend poking at your site in attempts to break (or break into) it.
On top of that, with loosely-typed languages like PHP and Perl, you can’t assume that any variable contains values of any particular type. All user-submitted variables—including those that contain URI info like PHP_SELF—are especially unsafe and should be validated and sanitized before use.
No need to belabor this point—validation is something we all know we should do. But the complexity of doing it right and the omnipresent pressure to get the product out are powerful disincentives, and the reality is often that if it can’t be done quickly, it doesn’t get done. This is especially true in applications with lots of objects.
The wrong way to handle validation is to do it in lots of different ways and scatter validation code all over your application. That way madness lies. There’s a much greater chance we’ll actually do validation correctly if it’s easy.
Ideally, validation should:
- use a consistent approach to validation throughout the application
- be able to apply many different types of validation rules to many different types of objects
- be able to make DAO objects savable
- assume data is invalid unless proven otherwise; i.e., think whitelist rather than blacklist
In other words, a good validator would take your object, tell you what’s wrong with it, and if you like, fix it for you. And it would do all this with minimal work from you.
Requirements
In the spirit of write the client code first, let’s start with what we would like as a user of the validation system. Something easy would be good, like this:
$v = new DAOValidator($class_name);
Under the hood, I would want that constructor to do a lot of the grunt work for me, like look at the class’s default values. And since I’m lazy, I would want it to do more. If the object is a DAO, I would want it to look at the database and set some validation rules based on the associated table.
All that cleverness is good, but I still would need to be able to define additional, more complex validation rules than the constructor could infer from the class and the database. I would want something like:
$v->addRule(... more on this below ...);
Sometimes we might want the validator to fix invalid data, so we’ll have to tell it what values to use for fields if they don’t validate:
$v->setDefault($field_name ,$default_value);
That’s probably enough to get started on the validator. Before going any further, let’s think about how to define the validation rules.
How to Define Validation Rules
For those cases where the default validation isn’t enough, you need to be able to give the validator rules to follow. Each rule will consist of:
- the field name it applies to
- the type of validation to perform
- the values to validate against (optional, depending on type)
So, I would like to be able to define a rule by writing something like:
$v->addRule( $field_name, $rule_type [,$params] );
Rule Types
Our validation framework will include the following rule types:
| Rule Name | Verify that … | |
|---|---|---|
| Numbers | INTEGER | the value is an integer (could be positive or negative; floats fail this test if they have a decimal point) |
| FLOAT | the value is a floating-point number (positive or negative; integers pass this test too) | |
| MIN_VALUE | the value is greater than or equal to the given parameter | |
| MAX_VALUE | the value is less than or equal to the given parameter | |
| Dates and Times |
DATE | the value can be parsed as a date |
| TIME | the value can be parsed as a time | |
| DATE_TIME | the value can be parsed as a date and time | |
| MIN_VALUE | in this context, the value is a time or date no earlier than the given parameter | |
| MAX_VALUE | in this context, the value is a time or date no later than the given parameter | |
| Strings | MIN_LENGTH* | the string is at least as long as specified |
| MAX_LENGTH* | the string is no longer than specified | |
| MATCH_ALL | the value must match all parameters; parameters are an array of regular expressions | |
| MATCH_ANY | the value must match any one of the parameters; the parameters are an array containing strings and/or regular expressions | |
| MATCH_NONE | the value must not match any of the parameters; the parameters are an array containing strings and/or regular expressions | |
| Special Cases | the value is a properly formatted email address | |
| URI | the value is a properly formatted URI |
* In most cases, minimum and maximum lengths can also be specified in regexes, so these are really only useful if you’re not also specifying a regex rule.
Getting the Validation Errors — the ValidationErrorSet Object
Fetching the errors will be as simple as:
$errors = $v->validate($my_object);
I would like the ->validate() method to return any errors it finds. We will have it return the errors in a ValidationErrorSet object, which will encapsulate a list of errors plus some useful methods to work with them.
Then, I would like to be able to use $errors in my code like this:
<?php
if ($errors->hasErrors($field_name)) {
echo '<ul>';
foreach ($errors->getErrorMessages($field_name) as $message) {
echo "<li>$message</li>";
}
echo '</ul>';
}
?>
It would be good if ->hasErrors() and ->getErrorMessages() could act on the whole object if no individual field is specified, because as the client, there will probably be cases where I don’t want to look at each field individually.
Starting the Implementation
We have enough requirements to start roughing out the classes we’ll need:
class Validator {
private $rules;
private $default_values;
public function __construct($class_name) {
}
public function addRule($field_name, $validation_type, $params) {
$this->$rules[] = new ValidationRule($field_name, $validation_type, $params);
}
public function setDefault($field_name, $default_value) {
}
public function validate() {
// do the validation here; get errors
// put errors in a ValidationErrorSet object
return $ValidationErrorSet;
}
}
class ValidationRule {
public function __construct() {
}
}
class ValidationErrorSet {
}
class ValidationError {
}
Where do the error messages come from?
There are several reasonable choices for where to store the text of error messages:
| storage | pros | cons |
|---|---|---|
| database table | elegant; arguably the most “correct” solution | complexity of retrieving data; database overhead* |
| in a config file | quick; easy; messages are always accessible | all possible error messages for the whole system would be loaded on every page load |
| in multiple text files | very fast; only load messages as needed; cacheable | less elegant than using a database; ambiguity about whether error messages are code or data |
* Disk I/O (but not other overhead) can be eliminated by using MySQL’s MEMORY storage engine.
One way to do it is to pull the messages from a table:
CREATE TABLE error_messages (
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
class VARCHAR(50),
field VARCHAR(50),
error_type VARCHAR(20),
message VARCHAR(200)
);
But they don’t change frequently, so you could avoid the database overhead and just pass them through from filesystem:
return file_gets(BASE_CODE_ROOT.’/error-messages/ActiveUser-password-MIN_LENGTH.txt’);
This method returns a ValidationErrorSet object:
field_name => array(type1 [, type n,…)
hasError(‘email_address’)) { ?>
- getErrorMessages(‘email_address’) as $message) { ?>
Auto-fixing Validation Errors
$v->makeValid($object);
This method simply sets any fields that fail to validate to their default values.
But Wait, There’s More
As a bonus, ->validate() method will also accept associative arrays, but since there’s no database table or class to look to for default validation rules, you have to specify all of the validation rules yourself.
Synopsis
You may download the Validation package here.
Validator Object
$v = new Validator(string $class_name);
$v->addRule(string $);
$v->setDefault(string $field_name, mixed $value);
ValidationErrorSet Object
$error_set = $v->validate(object $my_object);
$error_set->hasError(string $field_name); // returns TRUE or FALSE
$error_set->getErrorMessages(string $field_name); // returns array of error messages
This constructor does a few clever things. First, it knows that if this is a DAO object (that is, it’s an object that represents a database table), then it can infer some basic validation rules for the object by inspecting the database table where objects of this class are stored.
Second, it can learn from the object itself what its default values should be. In the event a field is found to be invalid, you might want the option of restoring it to a default value.
You will also want to be able to add new validation and override default validation rules with your own. So we would want some methods like:
$v->addRule($field_name,$rule_type,$params); $v->setDefault($field_name,$default_value);
That brings us to…
How to Obfuscate Integer IDs
May 10, 2009
Abstract
Many web sites and applications expose information about themselves by using plain integers as IDs. URIs often identify entities like pages, content elements, and even users in the form http://example.com/content.php?page=42. For example:

Using simple integers may expose certain information about a system, such as how many items a site has in its catalog or how quickly the site’s membership is growing. For various reasons, you might not want to reveal this information.
Integer IDs also make it trivial to probe a system with guessed IDs. While this isn’t necessarily a security risk (or even universally undesirable), you may wish to prevent this kind of probing of your system. Good obfuscation makes it effectively impossible to retrieve arbitrary URIs by simply guessing numeric IDs. Obfuscation is not intended to be a security feature, but it can add an element of detection and protection against casual probing.
This article will present the IdObfuscator class, which provides the static methods ::encode() and ::decode() to convert numeric IDs between integers and encrypted strings. The encrypted strings are tamper-resistant, so they can’t be altered without detection.
The IdObfuscator Class
While you could just store hashes in your database alongside integer IDs, this is unnecessary and inefficient in terms of both storage and indexing. Generally it is preferable to store IDs as unsigned integers and have algorithms to encode and decode them. This is what the IdObfuscator class does.
IdObfuscator was born of a project with requirements to hide the size of the site’s membership and its growth rate—the kinds of things startups often prefer not to to disclose. The specific requirements were to:
- hide information about the application by obfuscating all integer IDs
- not have to store and index the encoded form
- be able to detect if the encoded form is altered
Given these requirements and their intent, the solution should:
- use real hashing, not just reversible scrambling
- not rely on the algorithm being hidden (safe for visible-source languages, like PHP)
- encode and decode the IDs quickly
- not create encoded forms in any recognizable pattern
- always return the same value for a given ID and salt
- if possible, use functionality native to the language (not dependent on external libraries), and preferably, common to many languages
There are quite a lot of “solutions” to this problem floating around, but most rely on simple scrambling and other easily-reversible techniques. Few are based on one-way hashing. None I found met all of the above requirements, so I rolled my own. I experimented with a few different algorithms and finally settled on the one demonstrated here for its relative speed and simplicity.
IdObfuscator satisfied all of the above requirements. It has proved so useful that I incorporated its functionality into the DAO base class of the Spawn PHP Framework, so all its DAOs can use it via the ->getIdx() and ::getObjectByIdx() methods.
Live Demo
The Encoding Algorithm
IdObfuscator expects a constant called CRYPT_SALT1 to be defined before its methods are called.
IdObfuscator’s encoding algorithm is based on “burying” the value of $id within a random number. The value of $id is buried by subtracting it from (or adding it to) a random number so that the random number differs from its expected value by the value of $id. Thus, the value of $id can be calculated if you know what the random number should be; i.e., $id = the difference between the random number’s actual value and its expected value.
You would be correct if you guessed that, for this to work, the random number can’t be truly random; it must be effectively random, yet still predictable (by the code, not the client). Hashes like MD5 and SHA work well for this, so I used PHP’s sha1() function, and in all cases, appended a salt string to the value being hashed. Obfuscated IDs cannot be decoded without knowing the salt with which they were encrypted.
Here’s the algorithm in a nutshell:
- Create a random number (
$segment1) based on a hash of$id. - Create a second random number (
$segment2) based on a hash of$segment1. - Alter
$segment2by adding or subtracting the value of$id. - Make a third hash (
$segment3) from$segment1and the altered$segment2. This hash makes it possible to detect any alteration of the encoded ID. - Concatenate the three segments into a string, and voilà – you have your obfuscated ID.
To decode it, just calculate the difference between value of $segment2 and its expected value. The key is that the values of all segments are predictable only if you know the encryption salt.
Here’s the algorithm in more detail:
$segment1
First, create a hash from $id.2 PHP’s sha1() function returns a 40-character hexadecimal string, but for this segment we’re only going to use the first 16 characters of it.
$segment2
Next, create a second hash segment based on $segment1. $segment2 is the first eight characters of sha1($segment1.CRYPT_SALT). The purpose of this segment is to provide the random number into which $id will be buried.
Convert $segment2 from base 16 to base 10 ($dec is the decimal value of $segment2). The value of $dec will be a number between 0 and 4,294,967,295 (2^32-1) that is basically random, but predictable based on $segment1 if you know the value of CRYPT_SALT.
Bury $id in $dec by altering $dec by the value of $id. Since $dec is usually greater than $id, the alteration is usually $dec-$id; if the reverse is true, it’s $dec+$id. Thus, the value of $id can be learned by calculating the absolute difference between the expected value of $segment2 and its new value (after $id was subtracted or added).
The range of integers that can be safely buried in $segment2 is half of its maximum value, or 2,147,483,647 (2^31-1). Above that limit, there’s a risk (small at first, but increasing toward certainty as $id gets bigger) of overflowing 32-bit integers, which PHP will silently and happily do, and which will wreck the decoding process.
$segment3
Lastly, we compute an eight-character hash of the combined $segment1 and $segment2. This acts as a checksum, so if any character in any segment is changed, decoding will fail and ::decode() will return 0.
Squishing it Down
Together the three segments form a 32-digit hex string with $id buried in characters 16-23:
ef02550d40b359f5e4f14e52b2089761
While this string would be perfectly usable, it’s longer than necessary because base 16 isn’t the most efficient use of characters. I decided to shorten it a little by converting it to base 64. PHP’s base_convert() can’t handle 32-digit hex numbers, nor can it do base-64 conversion (it’s limited to base 36). So I decided to pack() the hex string into a 16-character binary string, then base64_encode() that, leaving a string that is (almost always) 22 characters long, about 30% shorter:
7wJVDUCzWfXk8U5SsgiXYQ
I figured I could live with that.
Character Fixing
Since the encoded IDs will be passed around in links, I decided to replace the / and + characters traditionally used in base-64 encoding, because those characters have other uses in URIs. I chose to replace them with : and $, but those are pretty arbitrary. Some other modified base64 variants use - and _. (I also chopped off any trailing =, which PHP handles without complaint.)
Source Code
<?php
class IdObfuscator {
public static function encode($id) {
if (!is_numeric($id) or $id < 1) {return FALSE;}
$id = (int)$id;
if ($id > pow(2,31)) {return FALSE;}
$segment1 = self::getHash($id,16);
$segment2 = self::getHash($segment1,8);
$dec = (int)base_convert($segment2,16,10);
$dec = ($dec>$id)?$dec-$id:$dec+$id;
$segment2 = base_convert($dec,10,16);
$segment2 = str_pad($segment2,8,'0',STR_PAD_LEFT);
$segment3 = self::getHash($segment1.$segment2,8);
$hex = $segment1.$segment2.$segment3;
$bin = pack('H*',$hex);
$oid = base64_encode($bin);
$oid = str_replace(array('+','/','='),array('$',':',''),$oid);
return $oid;
}
public static function decode($oid) {
if (!preg_match('/^[A-Z0-9\:\$]{21,23}$/i',$oid)) {return 0;}
$oid = str_replace(array('$',':'),array('+','/'),$oid);
$bin = base64_decode($oid);
$hex = unpack('H*',$bin); $hex = $hex[1];
if (!preg_match('/^[0-9a-f]{32}$/',$hex)) {return 0;}
$segment1 = substr($hex,0,16);
$segment2 = substr($hex,16,8);
$segment3 = substr($hex,24,8);
$exp2 = self::getHash($segment1,8);
$exp3 = self::getHash($segment1.$segment2,8);
if ($segment3 != $exp3) {return 0;}
$v1 = (int)base_convert($segment2,16,10);
$v2 = (int)base_convert($exp2,16,10);
$id = abs($v1-$v2);
return $id;
}
private static function getHash($str,$len) {
return substr(sha1($str.CRYPT_SALT),0,$len);
}
}
?>
Note: The (int) casts are there to fix a bug with big integers in some PHP builds.
Limitations
- Input values are limited to the range of 1 through 2,147,483,647 (2^31-1)
- You shouldn’t change CRYPT_SALT after the system begins producing live production data if any values encoded with CRYPT_SALT are stored anywhere, including cookies. (This applies not only to IdObfuscator, but to anything that uses CRYPT_SALT.) Even if encoded values aren’t stored anywhere, it’s strongly recommended that you not change CRYPT_SALT, because doing so would change all URIs that contain encoded IDs and wipe out any search-engine ranking your pages may have.
- This algorithm is intended for use on the web. It would lose some of its value in a system where the salt was delivered with the application, such as an installable desktop application.
Footnotes
1 CRYPT_SALT can be anything you like, but it should be (a) complex enough not to be guessable (like a password, which it effectively is), and (b) unique to each site on which it is used. It also can’t safely be changed once a site starts producing real live production data, since encoded IDs may persist in various places, like cookies and URIs. The Spawn PHP Framework defines CRYPT_SALT in host-config.php.
2 All hashes in IdObfuscator are created by the private ::getHash() method, which concatenates the subject with CRYPT_SALT.
