Just Software Solutions

Database Tip: Use an Artificial Primary Key

Monday, 19 November 2007

If your data has a clear "master" field or combination of fields, which can uniquely identify each row (such as customer name in a table of customers or ISBN for a table of books), it is tempting to use that as the primary key of the table. However, my advice is: don't do that, use a separate, numeric, artificial primary key instead. Yes, it's an extra column on the table, requiring extra space, and you will have to generate it somehow, but that's not a big deal. Every database vendor provides some way of auto-generating unique key values (e.g. SEQUENCEs in Oracle, and AUTOINCREMENT fields in SQLite), so populating it is easy, and the complications it saves are more than worth the trade-off. You can still maintain the uniqueness of the master columns by applying a unique index to those columns.

Save Space on Foreign Keys

Firstly, if you have any tables that are associated with the master table, and therefore have foreign key columns that refer to the primary key of your master, then having a separate primary key can actually save space overall, as the data for the master columns doesn't have to be duplicated across all the linked tables. This is especially important if there is more than one "master column", such as customer_first_name and customer_last_name, or if the data for these columns is large.

Changing the master data

Secondly, if the "master columns" are actually the primary key of your table, changing the data in them is potentially problematic, especially if they are used as a foreign key in other tables. Many online services use a customer's email address as their "master column": each customer has one email address, and one email address refers to one customer. That's fine until a customer changes their email address. Obviously, you don't want to lose all data associated with a given customer just because they changed their email address, so you need to update the row rather than delete the old one and insert a new one. If the email_address column is the primary key of the table, and therefore used as the foreign key in other tables, then you've got to update the data not just in the master table, but in each dependent table too.

This is not impossible, but it's certainly more complex and time consuming. If you miss a table, the transaction may not complete due to foreign key constraint violations, or (worse) the transaction may complete, but some of the data may be orphaned. Also, in some database engines, the constraint violation will fire when you change either the master table or the dependent table, so you need to execute a special SQL statement to defer the constraint checking until COMMIT time. If you use an auto-generated primary key, then only the data in the master table needs changing.

Changing the master columns

Finally, if the primary key is auto-generated, then not only is it easy to change the data in the master columns, but you can actually change what the master columns are. For example, if you initially decide that customer_first_name and customer_last_name make an ideal primary key, then you're stuck if you then get another customer with the same name. OK, so you make customer_first_name, customer_last_name and customer_address the primary key. Oops — now you've got to duplicate the address information across all the dependent tables. Now you encounter two people with the same name at the same address (e.g. father and son), so you need to add a new designator to the key (e.g. Henry Jones Junior, Richard Wilkins III). Again, you need to update all the dependent tables. If the primary key is auto-generated, there's no problem — just update the unique constraint on the master table to include the appropriate columns, and all is well, with the minimum of fuss.

Simplify your code

It's not going to simplify your code much, but using an auto-generated numeric key means that this is all you need to store as an identifier inside your program to refer to a particular row: much easier than storing the data from a combination of columns. Also, it's much easier to write code to update the data on one table than on multiple tables.

Conclusion

Don't use real table data as the primary key for a table: instead, use a separate, numeric, auto-generated column as the primary key. This will simplify the connections between tables, and make your life easier if the structure of the database or the data in the key columns changes.

Related Posts

In previous posts on Database Design, I've talked about:

Posted by Anthony Williams
[/ database /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

1 Comment

Hello friends, If you are searching for how to get free club penguin membership codes online within minute. You can also get free club penguin codes totally free of cost.

by Club penguin membership codes unused at 11:14:31 on Saturday, 31 December 2016

Add your comment

Your name:

Email address:

Person or spambot?

Your comment:

Design and Content Copyright © 2005-2018 Just Software Solutions Ltd. All rights reserved. | Privacy Policy