Build a Linux Webserver for Drupal in 8 Quick Steps

March 19, 2010

Recently I’ve had a cluster of projects that required setting up a large number of LAMP environments for Drupal on Rackspace Cloud Servers. Since these servers come with nothing, you have to install everything but the OS from scratch. This article describes how to do that in almost no time.

The examples below assume you’re running Ubuntu 9.10 (or 10.4) and are logged in as root.

0. Update

sudo apt-get update

sudo apt-get upgrade

Set the local time:

mv /etc/localtime /etc/localtime-old

ln -sf /usr/share/zoneinfo/America/Los_Angeles /etc/localtime

1. Install Apache 2

First, make sure your server knows its name by setting the correct server name in these two files:

/etc/hosts

/etc/hostname

Then install Apache:

sudo apt-get install apache2 apache2.2-common apache2-mpm-prefork apache2-utils libexpat1 ssl-cert

If you’re going to need to install an SSL certificate, install openssl:

sudo apt-get install openssl

Then, make sure Apache knows the server’s name:

vi /etc/apache2/apache2.conf

and add this line:

ServerName <fully-qualified domain name>

2. Install the MySQL Server and Client

sudo apt-get install mysql-server mysql-client

During the installation process, you’ll be prompted to create a password for the root user.

3. Install PHP5 with MySQL, GD, ImageMagick, and Memcache support

sudo apt-get install libapache2-mod-php5 php5 php5-cli php5-common php5-curl php5-dev php5-gd php5-imagick php5-mcrypt php5-memcache php5-mhash php5-mysql php5-pspell php5-snmp php5-sqlite php5-xmlrpc php5-xsl php5-fpm php-apc

sudo apt-get install imagemagick

sudo apt-get install memcached

4. Install and Configure an FTP Daemon

apt-get install vsftpd

Next, upen vsftpd.conf (vi /etc/vsftpd.conf) and set the following configuration options:

  • anonymous_enable=NO
  • local_enable=YES
  • write_enable=YES
  • local_umask=022
  • chroot_local_user=NO

Restart the FTP daemon: /etc/init.d/vsftpd restart. (In some case I’ve noticed that changes in vsftpd.conf are not recognized until the server is rebooted.)

Finally, create a new user to use for FTPing:

adduser username (Follow prompts to enter a password and other information.)

5. Install sendmail, alpine and zip/unzip

sudo apt-get install sendmail

sudo apt-get install alpine

While you’re at it, install zip/upzip, which you’ll probably need at some point:

sudo apt-get install zip

6. Enable mod_rewrite for Clean (SEO-friendly) URLs

First, enable Apache’s mod_rewrite module. To do this, create a link to rewrite.load in /etc/apache2/mods-enabled:

ln -s -T /etc/apache2/mods-available/rewrite.load /etc/apache2/mods-enabled/rewrite.load

Lastly, open /etc/apache2/sites-enabled/000-default and look for

<Directory /var/www/>
	...
</Directory>

Make these changes:

  • Change Options Indexes FollowSymLinks MultiViews
    to Options Indexes FollowSymLinks (remove ‘MultiViews’)
  • Change AllowOverride None
    to AllowOverride All
  • Add the following lines:
    RewriteEngine on
    RewriteBase /
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

If you want to password-protect* the site using HTTP Basic Auth, then add these directives:

AuthUserFile /etc/htpass/.htpasswd
AuthGroupFile /dev/null
AuthName "Enter Password"
AuthType Basic
require user drupaluser

Then, create the .htpass file:

mkdir /etc/htpass
cd /etc/htpass
htpasswd -c .htpasswd <username>

Follow the prompts to set the password for this user.

7. Restart Apache

/etc/init.d/apache2 restart

Confirm that Apache is loading the rewrite module:

apache2ctl -M

The module rewrite_module should be included in the list.

8. Install Drupal with the Zen theme and the CCK, FileField and Views modules

8.1 Create a database for Drupal to use:

mysql -u username -ppassword -e "create database database_name"

8.2 Install Drupal and set up directories:

cd /var/www
wget http://ftp.drupal.org/files/projects/drupal-6.19.tar.gz
gunzip drupal-6.19.tar.gz
tar -xvf drupal-6.19.tar
rm drupal-6.19.tar
mv drupal-6.19/* ./
rm -R drupal-6.19
cp /var/www/sites/default/default.settings.php /var/www/sites/default/settings.php
chmod -R 0777 /var/www/sites/default
cd /var/www/sites/all
mkdir modules
mkdir themes

Get the Zen base theme:

cd /var/www/sites/all/themes
wget http://ftp.drupal.org/files/projects/zen-6.x-1.1.tar.gz
gunzip zen-6.x-1.1.tar.gz
tar -xvf zen-6.x-1.1.tar
rm zen-6.x-1.1.tar

Install modules:

These will vary according to your requirements. These are the ones I include by default.

cd /var/www/sites/all/modules

wget http://ftp.drupal.org/files/projects/cck-6.x-2.6.tar.gz
gunzip cck-6.x-2.6.tar.gz
tar -xvf cck-6.x-2.6.tar
rm cck-6.x-2.6.tar

wget http://ftp.drupal.org/files/projects/filefield-6.x-3.3.tar.gz
gunzip filefield-6.x-3.3.tar.gz
tar -xvf filefield-6.x-3.3.tar
rm filefield-6.x-3.3.tar

wget http://ftp.drupal.org/files/projects/views-6.x-2.10.tar.gz
gunzip views-6.x-2.10.tar.gz
tar -xvf views-6.x-2.10.tar
rm views-6.x-2.10.tar

wget http://ftp.drupal.org/files/projects/advanced_help-6.x-1.2.tar.gz
gunzip advanced_help-6.x-1.2.tar.gz
tar -xvf advanced_help-6.x-1.2.tar
rm advanced_help-6.x-1.2.tar

wget http://ftp.drupal.org/files/projects/xmlsitemap-6.x-2.0-beta1.tar.gz
gunzip xmlsitemap-6.x-2.0-beta1.tar.gz
tar -xvf xmlsitemap-6.x-2.0-beta1.tar
rm xmlsitemap-6.x-2.0-beta1.tar

That’s All!

Open your site in a broswer and follow Drupal’s straightforward setup instructions.

The Firefox ‘Work Offline’ Bug

January 13, 2010

Here’s a maddening problem I observed while using Firefox 3.5 for PHP development. (My laptop dev environment consists of Windows 7, PHP 5.3, MySQL 5 and Apache 2.2.)

The Problem

Without warning, Firefox suddenly stopped showing the changes I was making to my PHP code. Refreshes (including hard refreshes) simply resulted in the last cached version of the page being redisplayed, regardless what changes I had made to my source code.

The Cause

I discovered this problem during a tropical storm, while the electricity (and therefore wi-fi) was going on and off. (Thank goodness for extra laptop batteries!) What was happening was that Firefox responded to interrupted internet access by silently switching to ‘Offine Mode’. When it does this, Firefox insists on re-serving the last cached version of a page without attempting to reload it, even if the page comes from localhost.

The fact that it does this silently is the reason I characterize this behavior as a bug.

The Solution

Click File and uncheck the Work Offline option, and you should be able to quit pulling your hair out and get back to work. Unless you’re here in Puerto Viejo, Costa Rica, in which case you should just go for a walk down the beach instead.

Update

My friend and genius LAMP developer Michael Henretty found in a few minutes what eluded me all day: this thread on mozilla.com.

The trick is to go to about:config, find browser.offline-apps.notify and set it to false.

Thanks, Mike.

How to Use the User Interface to Discover All the Use Cases

December 29, 2009

If you build software for a living, it’s a matter of time before you find yourself discovering too late in a project that some important piece of functionality wasn’t accounted for. Often this is the result of assumptions — whoever was responsible for writing the functional spec assumed that some piece of functionality must be included, since omitting it would be illogical. Discovery of a missed piece of functionality usually sounds something like:

“Of course moderators have to approve user-uploaded photos! How could they not?”

Omissions like this are particularly common in the administrative half of applications. After all, the product guys invest lots of time designing the public-facing user interface, but use cases for administrative users are often overlooked or assumed to somehow just automagically be there. Those tasked with detailed spec-writing are often surprised that, in most systems, there are at least as many use cases for administrators as for public users.

This article discusses a simple technique for discovering those easy-to-miss use cases as early in the project as possible.

Accounting for all Basic UI Functions

You’re probably familiar with the set of basic database functions collectively known as CRUD:

  • Create a new record
  • Recall an existing record
  • Update an existing record
  • Delete an existing record

These four actions constitute the basic functions upon which almost all data storage and retrieval actions are based.

Corresponding to these four primary database functions are eight primary actions that your UI may provide:

UI Actions CRUD Functions
add create
view recall
list recall
search recall
edit update
recycle update (assumes setting a flag)
restore update (assumes setting a flag)
delete delete

Since our goal is to avoid surprises late in the project, we’re going to try to account for every action by every actor (user) upon every object in the system.

By way of example, let’s assume your system is something like flickr, so it will have use cases to handle its Photograph objects. We can account for those use cases like in this example:

Use Cases – Photograph Object
UI Action Actor Use Case
add new user Registered user uploads a new photograph.
view any user User clicks on thumbnail to view a full-size enlargement of a photograph.
view registered user User clicks on thumbnail to view a full-size enlargement of a private photograph whose access is restricted to users on the owner’s friends list.
list any user User views a list of thumbnails for photographs uploaded in the past seven days.
list registered user Registered user views a list of thumbnails for photographs they have uploaded.
list registered user Registered user views a list of thumbnails for photographs they have recycled.
list moderator Moderator views a list of thumbnails for photographs awaiting moderation.
search any user User searches for photographs by keywords and/or tags.
search any user User searches for photographs by minimum or maximum dimensions.
edit registered user Registered user edits metadata (title, description, tags, etc.) for one of their photographs.
edit registered user Registered user selects tags and applies them to a group of his photographs.
edit registered user Registered user changes the visibility of a photograph from ‘visible’ to ‘invisible’.
edit registered user Registered user changes the visibility of a photograph from ‘invisible’ to ‘visible’.
edit registered user Registered uploads a new photograph to replace an existing one. (Note: This action puts the photograph back in the queue for moderator approval.)
edit moderator Moderator sets the status of a photograph to ‘approved’.
edit moderator Moderator sets the status of a photograph to ‘rejected’.
recycle registered user Registered user recycles one photograph.
recycle registered user Registered user recycles photographs selected from a group.
recycle registered user Registered user recycles all of the photographs in a selected album.
restore registered user User restores a recycled photograph.
delete registered user Registered user permanently deletes a recycled photograph.

This is an example using only the system’s Photograph object. The idea is to go through this exercise with, yes, every object in the system.

Whoa! That’s going to be a lot of use cases!

Yes, it probably is, and it’s invariably more than you’d expect after just a cursory review of a system. But that’s the whole point: A cursory review misses a lot, so this is a fairly easy, methodical way to step through all of the actions, actors and objects in a system.

Are you crazy? Do you know how long that will take?

Yes, actually I know exactly how long it takes. I also know that it’s worth every minute. Here’s why: By the time any project is successfully completed, you have — one way or another — done this exercize. If you haven’t, the project didn’t complete successfully, and vice versa.

If this sounds tedious, it is. It can take days (or longer) to plow through all of the use cases uncovered by this technique. Note that I say uncovered and not created; this approach is meant to reveal and capture all those little cases nobody thought to include. One way or the other, your system won’t be complete until you account for all of these use cases, so the sooner you discover them, the better.

Since you’re going to have to account for all of these use cases in order to succeed, there are some compelling reasons to do it early in the project:

  • Identifying all of the likely use cases early helps you decide how to phase the project, which features to defer, and which pieces of functionality can probably be omitted altogether.
  • Identifying as many use cases as possible early in the project gives you the best data on how long it’s likely to take, how much it’s likely to cost, and in fact whether it’s worth doing.
  • Identifying use cases early enables testers to begin drafting test plans almost immediately.
  • Identifying use cases early enables tech writers to start writing user manuals almost immediately.
  • Identifying use cases early enables the whole team to succeed by design rather than by chance, which is good, because chance has a lousy track record.

It’s worth noting that there won’t necessarily be use cases for every action by every actor for every object. For example, there’s often no requirement for registered public users to delete thier own account (that is, the User object that represents their own account). It’s common to decide that not all functions are required for every object. Remember, the goal is to account for every possibility, not necessarily to implement them all.

How to Obfuscate Integer IDs

May 10, 2009

Abstract

Many web sites and applications expose information about themselves by using plain integers as IDs. URIs often identify entities like pages, content elements, and even users in the form http://example.com/content.php?page=42. For example:

Facebook Member URI

Using simple integers may expose certain information about a system, such as how many items a site has in its catalog or how quickly the site’s membership is growing. For various reasons, you might not want to reveal this information.

Integer IDs also make it trivial to probe a system with guessed IDs. While this isn’t necessarily a security risk (or even universally undesirable), you may wish to prevent this kind of probing of your system. Good obfuscation makes it effectively impossible to retrieve arbitrary URIs by simply guessing numeric IDs. Obfuscation is not intended to be a security feature, but it can add an element of detection and protection against casual probing.

This article will present the IdObfuscator class, which provides the static methods ::encode() and ::decode() to convert numeric IDs between integers and encrypted strings. The encrypted strings are tamper-resistant, so they can’t be altered without detection.

The IdObfuscator Class

While you could just store hashes in your database alongside integer IDs, this is unnecessary and inefficient in terms of both storage and indexing. Generally it is preferable to store IDs as unsigned integers and have algorithms to encode and decode them. This is what the IdObfuscator class does.

IdObfuscator was born of a project with requirements to hide the size of the site’s membership and its growth rate—the kinds of things startups often prefer not to to disclose. The specific requirements were to:

  • hide information about the application by obfuscating all integer IDs
  • not have to store and index the encoded form
  • be able to detect if the encoded form is altered

Given these requirements and their intent, the solution should:

  • use real hashing, not just reversible scrambling
  • not rely on the algorithm being hidden (safe for visible-source languages, like PHP)
  • encode and decode the IDs quickly
  • not create encoded forms in any recognizable pattern
  • always return the same value for a given ID and salt
  • if possible, use functionality native to the language (not dependent on external libraries), and preferably, common to many languages

There are quite a lot of “solutions” to this problem floating around, but most rely on simple scrambling and other easily-reversible techniques. Few are based on one-way hashing. None I found met all of the above requirements, so I rolled my own. I experimented with a few different algorithms and finally settled on the one demonstrated here for its relative speed and simplicity.

IdObfuscator satisfied all of the above requirements. It has proved so useful that I incorporated its functionality into the DAO base class of the Spawn PHP Framework, so all its DAOs can use it via the ->getIdx() and ::getObjectByIdx() methods.

Live Demo

The Encoding Algorithm

IdObfuscator expects a constant called CRYPT_SALT1 to be defined before its methods are called.

IdObfuscator’s encoding algorithm is based on “burying” the value of $id within a random number. The value of $id is buried by subtracting it from (or adding it to) a random number so that the random number differs from its expected value by the value of $id. Thus, the value of $id can be calculated if you know what the random number should be; i.e., $id = the difference between the random number’s actual value and its expected value.

You would be correct if you guessed that, for this to work, the random number can’t be truly random; it must be effectively random, yet still predictable (by the code, not the client). Hashes like MD5 and SHA work well for this, so I used PHP’s sha1() function, and in all cases, appended a salt string to the value being hashed. Obfuscated IDs cannot be decoded without knowing the salt with which they were encrypted.

Here’s the algorithm in a nutshell:

  1. Create a random number ($segment1) based on a hash of $id.
  2. Create a second random number ($segment2) based on a hash of $segment1.
  3. Alter $segment2 by adding or subtracting the value of $id.
  4. Make a third hash ($segment3) from $segment1 and the altered $segment2. This hash makes it possible to detect any alteration of the encoded ID.
  5. Concatenate the three segments into a string, and voilà – you have your obfuscated ID.

To decode it, just calculate the difference between value of $segment2 and its expected value. The key is that the values of all segments are predictable only if you know the encryption salt.

Here’s the algorithm in more detail:

$segment1

First, create a hash from $id.2 PHP’s sha1() function returns a 40-character hexadecimal string, but for this segment we’re only going to use the first 16 characters of it.

$segment2

Next, create a second hash segment based on $segment1. $segment2 is the first eight characters of sha1($segment1.CRYPT_SALT). The purpose of this segment is to provide the random number into which $id will be buried.

Convert $segment2 from base 16 to base 10 ($dec is the decimal value of $segment2). The value of $dec will be a number between 0 and 4,294,967,295 (2^32-1) that is basically random, but predictable based on $segment1 if you know the value of CRYPT_SALT.

Bury $id in $dec by altering $dec by the value of $id. Since $dec is usually greater than $id, the alteration is usually $dec-$id; if the reverse is true, it’s $dec+$id. Thus, the value of $id can be learned by calculating the absolute difference between the expected value of $segment2 and its new value (after $id was subtracted or added).

The range of integers that can be safely buried in $segment2 is half of its maximum value, or 2,147,483,647 (2^31-1). Above that limit, there’s a risk (small at first, but increasing toward certainty as $id gets bigger) of overflowing 32-bit integers, which PHP will silently and happily do, and which will wreck the decoding process.

$segment3

Lastly, we compute an eight-character hash of the combined $segment1 and $segment2. This acts as a checksum, so if any character in any segment is changed, decoding will fail and ::decode() will return 0.

Squishing it Down

Together the three segments form a 32-digit hex string with $id buried in characters 16-23:

ef02550d40b359f5e4f14e52b2089761

While this string would be perfectly usable, it’s longer than necessary because base 16 isn’t the most efficient use of characters. I decided to shorten it a little by converting it to base 64. PHP’s base_convert() can’t handle 32-digit hex numbers, nor can it do base-64 conversion (it’s limited to base 36). So I decided to pack() the hex string into a 16-character binary string, then base64_encode() that, leaving a string that is (almost always) 22 characters long, about 30% shorter:

7wJVDUCzWfXk8U5SsgiXYQ

I figured I could live with that.

Character Fixing

Since the encoded IDs will be passed around in links, I decided to replace the / and + characters traditionally used in base-64 encoding, because those characters have other uses in URIs. I chose to replace them with : and $, but those are pretty arbitrary. Some other modified base64 variants use - and _. (I also chopped off any trailing =, which PHP handles without complaint.)

Source Code

<?php
class IdObfuscator {

	public static function encode($id) {
		if (!is_numeric($id) or $id < 1) {return FALSE;}
		$id = (int)$id;
		if ($id > pow(2,31)) {return FALSE;}
		$segment1 = self::getHash($id,16);
		$segment2 = self::getHash($segment1,8);
		$dec      = (int)base_convert($segment2,16,10);
		$dec      = ($dec>$id)?$dec-$id:$dec+$id;
		$segment2 = base_convert($dec,10,16);
		$segment2 = str_pad($segment2,8,'0',STR_PAD_LEFT);
		$segment3 = self::getHash($segment1.$segment2,8);
		$hex      = $segment1.$segment2.$segment3;
		$bin      = pack('H*',$hex);
		$oid      = base64_encode($bin);
		$oid      = str_replace(array('+','/','='),array('$',':',''),$oid);
		return $oid;
	}

	public static function decode($oid) {
		if (!preg_match('/^[A-Z0-9\:\$]{21,23}$/i',$oid)) {return 0;}
		$oid      = str_replace(array('$',':'),array('+','/'),$oid);
		$bin      = base64_decode($oid);
		$hex      = unpack('H*',$bin); $hex = $hex[1];
		if (!preg_match('/^[0-9a-f]{32}$/',$hex)) {return 0;}
		$segment1 = substr($hex,0,16);
		$segment2 = substr($hex,16,8);
		$segment3 = substr($hex,24,8);
		$exp2     = self::getHash($segment1,8);
		$exp3     = self::getHash($segment1.$segment2,8);
		if ($segment3 != $exp3) {return 0;}
		$v1       = (int)base_convert($segment2,16,10);
		$v2       = (int)base_convert($exp2,16,10);
		$id       = abs($v1-$v2);
		return $id;
	}

	private static function getHash($str,$len) {
		return substr(sha1($str.CRYPT_SALT),0,$len);
	}
}
?>

Note: The (int) casts are there to fix a bug with big integers in some PHP builds.

Limitations

  • Input values are limited to the range of 1 through 2,147,483,647 (2^31-1)
  • You shouldn’t change CRYPT_SALT after the system begins producing live production data if any values encoded with CRYPT_SALT are stored anywhere, including cookies. (This applies not only to IdObfuscator, but to anything that uses CRYPT_SALT.) Even if encoded values aren’t stored anywhere, it’s strongly recommended that you not change CRYPT_SALT, because doing so would change all URIs that contain encoded IDs and wipe out any search-engine ranking your pages may have.
  • This algorithm is intended for use on the web. It would lose some of its value in a system where the salt was delivered with the application, such as an installable desktop application.

Footnotes

1 CRYPT_SALT can be anything you like, but it should be (a) complex enough not to be guessable (like a password, which it effectively is), and (b) unique to each site on which it is used. It also can’t safely be changed once a site starts producing real live production data, since encoded IDs may persist in various places, like cookies and URIs. The Spawn PHP Framework defines CRYPT_SALT in host-config.php.

2 All hashes in IdObfuscator are created by the private ::getHash() method, which concatenates the subject with CRYPT_SALT.

Write the Client Code First

May 31, 2008

“Always implement things when you actually need them, never when you just foresee that you need them.”

–Ron Jeffries

In his book Guerrilla Sales, Jay Conrad Levinson outlined an approach to selling that at first seemed backwards: Close the sale first, then pitch what you’re selling. The idea was that, rather than trying to coerce a client to buy your pre-determined offering, find out what the client wants first, then give it to them. Your service is then tailored to match precisely what the client requires—no more, and no less.

There’s an analogue in software development: Any time you have code that will provide a service and client code that will consume that service, write the client code first. In other words, don’t lock down lower-level code and then struggle to make higher-level code use it. Whenever possible, let the client code dictate the requirements for the code it uses.

Writing the client code first means:

  • Do nothing more than stub the return value of a function until you finish the client code that uses it.
  • Write code that uses a class before trying to finalize the class’s implementation.
  • Build the UI before writing controllers.
  • Write your controllers before writing data-access objects. (Better yet, auto-generate your base DAOs, and only add methods as the controllers require them.)

The rationale for building software this way is compelling:

  • Precision — Identify and build exactly what is required, when it is required, in the order required; no more and no sooner.
  • Speed — With priorities made clear and bloat squeezed out (or at least deferred), your delivery will be fast.
  • Rapid Feedback — Make your work visible to customers early so that you get their feedback right away.

If this seems like it should be a no-brainer, you’re right; it should be. And if agile development is second nature to you, you might wonder why this article was written at all. But most software still isn’t built this way, and as a result, most projects takes longer than necessary and don’t satisfy requirements as precisely as they should.

Defining the Finish Line

Face it: Your customers have no imagination. You’re more likely to be struck by lightening while winning the lottery than find a customer who thinks about building software the same way you do. And face it: You’ll probably never understand their business like they do, but they will assume that you understand what they want. Unless you can both agree on how to define “done,” you’ll eventually have a disagreement to settle.

How do you bridge this communication gap? Well-written use cases aren’t bad, but they’re not enough. Wireframes are good too. But there’s an even better way: Write the client code first, starting with the UI. Assuming you’re building web-apps, expressing requirements as static pages is by far the best technique I’ve ever seen for creating a common understanding about what you’re building. It’s the clearest way to define the finish line: “When these screens perform functions X, Y and Z, we’re done with this phase.”

Prototyping like this takes write the client code first to its logical extreme, since the UI is the ultimate client code in your application. Build it first, get it right, then use it to drive requirements.

Get the Fastest Possible Feedback

Customers can’t really picture how they will use software until they see something that looks like the finished product. The sooner you give them that, the sooner they can correct any errors in your assumptions.

Prototyping static screens is especially suited to web-app development, where it’s fast and easy to “demonstrate” functionality with stubbed navigation and dummy data, without having to build the full MVC stack. Once the screens and their behavior are agreed upon, bring the UI to life by implementing functionality a piece at a time. Your progress is immediately and continually visible to the customer, so course corrections are quick and small.

This has a tremendous advantage over bottom-up development, in which functionality isn’t visible in the UI until very late in the development process, often after it’s too late to make changes quickly.

Top-down Minimalism and YAGNI

“We will encourage you to develop the three great virtues of a programmer: laziness, impatience, and hubris.”

–Larry Wall

The laziness that Larry Wall referred to is “The quality that makes you go to great effort to reduce overall energy expenditure.” Writing the client code first reduces unnecessary work to the absolute minimum.

If, like me, you spent much of your early career building software from the bottom up—data layer first, then business logic, and saving the UI until last—building from the top down might seem utterly backwards. After all, the bottom-up approach has intuitive appeal; it just seems to make sense, and it’s easy to understand. So what’s the problem?

The problem is that it’s incredibly wasteful. It takes longer than necessary, and it breeds overly complex solutions to simple problems.

Bottom-up development is wasteful because assumes too much. It is based on a notion of completeness that says, “This service/class/layer won’t be complete unless it has every feature a client might need.”

This concept of “complete” is an illusion. You may assume that if there’s a requirement to add and delete items, there are also needs to edit, recycle and restore them. As often as not, assumptions like these prove false, and building functionality before there’s a concrete requirement for it is almost always waste of time.

Given a little time, requirements will always:

  • disappear, meaning you won’t need to build it after all
  • change, meaning you’ll need to modify or rewrite what you’ve built
  • become clearer, meaning you’ll need to modify or rewrite what you’ve built

“Pre-emptive implementation” is also wasteful because it robs you of time for more important things, like building functionality that’s needed sooner, or refactoring, or releasing earlier. Or having a life, or taking a vacation. Building more than you need now means that you’re on the hook for testing and debugging more than necessary, and it’s often unclear what tests will validate a feature that isn’t actually used anywhere.

The Extreme Programming maxim You Aren’t Going to Need It (YAGNI; occasionally misspelled YANGI), advocates a minimalist approach in which you build only what is needed to satisfy immediate requirements. Requirements and priorities change so frequently that it is a waste of time to try to predict them too far in advance and build everything “just in case” it’s needed.

Since it represents users’ actions, the UI defines requirements for the controllers (or application layer). For example, if the UI needs to display a list of users currently online, then a controller must provide that list, and in turn the User class must provide a getOnlineUsers() method. Development of the UI doesn’t even need to wait for getOnlineUsers() to be fully implemented; in the short term, the method can just return an array of dummy User objects. But you wouldn’t automatically implement methods like getAllUsers() or getInactiveUsers() just for the sake of “completeness” until there was an actual, immediate requirement for them.

This approach has some big overlaps with practices like test-driven development, client-first development, and behavior-driven development.

Counterpoint

In theory there is no difference between theory and practice. In practice there is.

–Yogi Berra

I’ve yet to hear a credible objection to top-down development by anyone who has actually done it. Debates on the subject seem to revolve around how it theoretically might not work or exceptional cases where it doesn’t apply. These ignore the reality that in practice, top-down development has a phenomenally high success rate in the vast majority of cases.

Usually the arguments are some version of, “It will take more time to build it later than if we build it now.”

No, usually it won’t. Or, at least, not much longer. This notion seems to stem at least partly from misapplication of a principle of software QA in which bugs found early cost a lot less to fix than bugs found later. Rarely is this argument relevant in modern, modular architecture. (It might be worth noting that you’ll never have to fix bugs in code that is never written.)

Even if it did take a little longer—or even twice as long—the extra time would be more than offset by the time saved by not writing (and testing, and debugging) features that aren’t ever used.

Further Reading

Spawn PHP Framework

Learn all about installing and working with this easy-to-use MVC framework.

Continue reading »

Coming Soon: Scaveng

Scaveng is a scriptable web scraper. Stay tuned for more.