How to Validate Objects in PHP

May 12, 2009

(Note: This article is incomplete – I’m just using it to work out some kinks with the new site tools)

One of the most tedious aspects of large projects is handling validation on all the objects they use.

Validation of user input is important to all software, but it has some special security implications for web-based software. Unlike desktop applications, web apps open your hardware, software and data to any user, anywhere in the world. And a few of those users have the inclination and time to spend poking at your site in attempts to break (or break into) it.

On top of that, with loosely-typed languages like PHP and Perl, you can’t assume that any variable contains values of any particular type. All user-submitted variables—including those that contain URI info like PHP_SELF—are especially unsafe and should be validated and sanitized before use.

No need to belabor this point—validation is something we all know we should do. But the complexity of doing it right and the omnipresent pressure to get the product out are powerful disincentives, and the reality is often that if it can’t be done quickly, it doesn’t get done. This is especially true in applications with lots of objects.

The wrong way to handle validation is to do it in lots of different ways and scatter validation code all over your application. That way madness lies. There’s a much greater chance we’ll actually do validation correctly if it’s easy.

Ideally, validation should:

  1. use a consistent approach to validation throughout the application
  2. be able to apply many different types of validation rules to many different types of objects
  3. be able to make DAO objects savable
  4. assume data is invalid unless proven otherwise; i.e., think whitelist rather than blacklist

In other words, a good validator would take your object, tell you what’s wrong with it, and if you like, fix it for you. And it would do all this with minimal work from you.

Requirements

In the spirit of write the client code first, let’s start with what we would like as a user of the validation system. Something easy would be good, like this:

    $v = new DAOValidator($class_name);

Under the hood, I would want that constructor to do a lot of the grunt work for me, like look at the class’s default values. And since I’m lazy, I would want it to do more. If the object is a DAO, I would want it to look at the database and set some validation rules based on the associated table.

All that cleverness is good, but I still would need to be able to define additional, more complex validation rules than the constructor could infer from the class and the database. I would want something like:

    $v->addRule(... more on this below ...);

Sometimes we might want the validator to fix invalid data, so we’ll have to tell it what values to use for fields if they don’t validate:

    $v->setDefault($field_name ,$default_value);

That’s probably enough to get started on the validator. Before going any further, let’s think about how to define the validation rules.

How to Define Validation Rules

For those cases where the default validation isn’t enough, you need to be able to give the validator rules to follow. Each rule will consist of:

  • the field name it applies to
  • the type of validation to perform
  • the values to validate against (optional, depending on type)

So, I would like to be able to define a rule by writing something like:

    $v->addRule( $field_name, $rule_type [,$params] );

Rule Types

Our validation framework will include the following rule types:

Rule Name Verify that …
Numbers INTEGER the value is an integer (could be positive or negative; floats fail this test if they have a decimal point)
FLOAT the value is a floating-point number (positive or negative; integers pass this test too)
MIN_VALUE the value is greater than or equal to the given parameter
MAX_VALUE the value is less than or equal to the given parameter
Dates
and
Times
DATE the value can be parsed as a date
TIME the value can be parsed as a time
DATE_TIME the value can be parsed as a date and time
MIN_VALUE in this context, the value is a time or date no earlier than the given parameter
MAX_VALUE in this context, the value is a time or date no later than the given parameter
Strings MIN_LENGTH* the string is at least as long as specified
MAX_LENGTH* the string is no longer than specified
MATCH_ALL the value must match all parameters; parameters are an array of regular expressions
MATCH_ANY the value must match any one of the parameters; the parameters are an array containing strings and/or regular expressions
MATCH_NONE the value must not match any of the parameters; the parameters are an array containing strings and/or regular expressions
Special
Cases
EMAIL the value is a properly formatted email address
URI the value is a properly formatted URI

* In most cases, minimum and maximum lengths can also be specified in regexes, so these are really only useful if you’re not also specifying a regex rule.

Getting the Validation Errors — the ValidationErrorSet Object

Fetching the errors will be as simple as:

    $errors = $v->validate($my_object);

I would like the ->validate() method to return any errors it finds. We will have it return the errors in a ValidationErrorSet object, which will encapsulate a list of errors plus some useful methods to work with them.

Then, I would like to be able to use $errors in my code like this:

    <?php
        if ($errors->hasErrors($field_name)) {
            echo '<ul>';
            foreach ($errors->getErrorMessages($field_name) as $message) {
                echo "<li>$message</li>";
            }
            echo '</ul>';
        }
    ?>

It would be good if ->hasErrors() and ->getErrorMessages() could act on the whole object if no individual field is specified, because as the client, there will probably be cases where I don’t want to look at each field individually.

Starting the Implementation

We have enough requirements to start roughing out the classes we’ll need:

class Validator {

    private $rules;
    private $default_values;

    public function __construct($class_name) {
    }

    public function addRule($field_name, $validation_type, $params) {
        $this->$rules[] = new ValidationRule($field_name, $validation_type, $params);
    }

    public function setDefault($field_name, $default_value) {
    }

    public function validate() {
        // do the validation here; get errors
        // put errors in a ValidationErrorSet object
        return $ValidationErrorSet;
    }
}
class ValidationRule {
    public function __construct() {
    }
}
class ValidationErrorSet {
}
class ValidationError {
}

Where do the error messages come from?

There are several reasonable choices for where to store the text of error messages:

storage pros cons
database table elegant; arguably the most “correct” solution complexity of retrieving data; database overhead*
in a config file quick; easy; messages are always accessible all possible error messages for the whole system would be loaded on every page load
in multiple text files very fast; only load messages as needed; cacheable less elegant than using a database; ambiguity about whether error messages are code or data

* Disk I/O (but not other overhead) can be eliminated by using MySQL’s MEMORY storage engine.

One way to do it is to pull the messages from a table:

CREATE TABLE error_messages (
    id          INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    class       VARCHAR(50),
    field       VARCHAR(50),
    error_type  VARCHAR(20),
    message     VARCHAR(200)
);

But they don’t change frequently, so you could avoid the database overhead and just pass them through from filesystem:

return file_gets(BASE_CODE_ROOT.’/error-messages/ActiveUser-password-MIN_LENGTH.txt’);

This method returns a ValidationErrorSet object:

field_name => array(type1 [, type n,…)

hasError(‘email_address’)) { ?>

    getErrorMessages(‘email_address’) as $message) { ?>

Auto-fixing Validation Errors

$v->makeValid($object);

This method simply sets any fields that fail to validate to their default values.

But Wait, There’s More

As a bonus, ->validate() method will also accept associative arrays, but since there’s no database table or class to look to for default validation rules, you have to specify all of the validation rules yourself.

Synopsis

You may download the Validation package here.

Validator Object

$v = new Validator(string $class_name);
$v->addRule(string $);
$v->setDefault(string $field_name, mixed $value);

ValidationErrorSet Object

$error_set = $v->validate(object $my_object);
$error_set->hasError(string $field_name); // returns TRUE or FALSE
$error_set->getErrorMessages(string $field_name); // returns array of error messages



This constructor does a few clever things. First, it knows that if this is a DAO object (that is, it’s an object that represents a database table), then it can infer some basic validation rules for the object by inspecting the database table where objects of this class are stored.

Second, it can learn from the object itself what its default values should be. In the event a field is found to be invalid, you might want the option of restoring it to a default value.

You will also want to be able to add new validation and override default validation rules with your own. So we would want some methods like:

	$v->addRule($field_name,$rule_type,$params);
	$v->setDefault($field_name,$default_value);

That brings us to…

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • Reddit
  • Technorati
  • TwitThis

Comments

No Comments Yet.

Got something to say?

You must be logged in to post a comment.

Spawn PHP Framework

Learn all about installing and working with this easy-to-use MVC framework.

Continue reading »

Coming Soon: Scaveng

Scaveng is a scriptable web scraper. Stay tuned for more.