Productive Linux


 Subscribe via Feedburner in a reader

Enter your email address:

Delivered by FeedBurner


Don't have an account?
Sign up to
Forgot your password?

How Far to Normalize? That is the Question.
5 December 2008 @ 23:17 GMT
by Paul

I've been agonising about normalising a database system, in fact, the database system this web site runs on.

When I first considered splitting out some table attributes into seperate tables, I didn't realise how completely un-normalized the database was, how much redundant data there are, how many integrity checks where being pushed up into the application logic layer and so on. Basically, it's a much bigger task than I had previously imagined.

To save the work, there was one group of field that I was considering not normalising. Consider a table with these fields:

node_idauthor_permissionsgroup_permissionsother_permissions... other attribtutes

Each of the permissions is a string of four characters. Each character can either be 'i', the permission is inherited, '-' the permission is disabled or one of 'rwdx', the permission is enabled.

So, each permission is really a boolean and should be represented as such. More than that, each permission is a trit, where it could take one of three values NULL, 0 or 1 meaning 'inherit', 'disable' or 'enable' respectively.

To normalise the above database table properly, I'd have to create a permissions table like this:

CREATE TABLE "access" (
usertype varchar(6) NOT NULL,
permission varchar(10) NOT NULL,
access boolean NOT NULL,
node_id bigint REFERENCES node(node_id) ON DELETE CASCADE,
PRIMARY KEY (usertype, permission, node_id)

In this table, usertype is one of 'author', 'group', 'guest' or 'other' and permission is one of 'read', 'write', 'execute' or 'delete'. The table is normalised because the access value is dependent only on the key (which consists of the other three columns). The permission value has a 'NOT NULL' constraint, because if the permission is inherited we don't set the permission field to 'NULL', we delete the row from the table - hence on LEFT JOIN queries 'NULL' is returned.

This all makes logical sense. However, I wasn't going to do it because, who wants a big table of boolean values? that are going to be joined anyway? especially when the application layer still expects a string 'i', '-' and 'rwxd'? So, I was going to create a fudge.

Thanks to Phil Factor, I won't now:

OTLT [One True Lookup Table] syndrome comes about usually when one is forced to create a number of simple lookup tables which contain a code and a name. One gets the sudden flash of inspiration that one could roll them all into one lookup table, with a unique identifier for each row, and a type field, thereby saving the chore of creating and maintaining a number of tables. EAV disease, like OTLT syndrome both tend to be the result of pressure from programmers to push their OO designs back into the relational database, like a bathyscaphe in an alien environment . There is loose talk of ‘persisting object data in a data store’, as though one was storing apples in a shed. It also comes about when a development team attempts an ‘Agile’ development before they've fully understood the data model.

Thanks you too, Fabian Pascal on the dangers of a not normalized database. And the problems of integrity if you don't normalise. Indeed, there have already been problems with integrity in the database system as a whole including the permissions system.

Normalization, here I come!

Tags: database

Comments disabled