Wednesday, 08 August 2007
A client of ours makes hardware that generates data files, and a few years ago I wrote a piece of software for them to help manage those data files. Initially it only had to deal with a few data files, so it indexed them on start up. Then someone tried to use it with several thousand data files, and start-up times got too slow, so I modified the indexing code to dump the current index to an XML file on shutdown, which it then loaded at startup. This has worked well, but now they're using it to handle hundreds of thousands of files, and the startup and shutdown times are again becoming significant due to the huge size of the XML file. Also, the data file access times are now getting larger due to the size of the in-memory index. We've now been hired again to address the issue, so this time I'm going to use a SQLite database for the index — no start up delay, no shutdown delay, and faster index lookup.
What lessons can be learned from this experience? Should I have gone for SQLite in the first instance? I don't think so. Using a simple in-memory map for the initial index was the simplest thing that could possibly work, and it has worked for a few years. The XML index file was a small change, and it kept the application working for longer. Now the application does need a database, but the implementation is certainly more complex than the initial in-memory map. By using the simple implementation first, the application was completed quicker — not only did this save my client money in the development, but it meant they could begin using it sooner. It also meant that now I come to add the database code, the requirements are better-known and there are already a whole suite of tests for how the index should behave. It has taken less than a day to write the database indexing code, whereas it could easily have taken several days at the beginning.
I think people are often too keen to jump straight to using a database, when they could often get by for now with something far simpler. That doesn't mean that requirements won't evolve, and that a database won't be required in the end, but this time can often be put off for years, thus saving time and money. In this instance I happened to use SQLite, which is free, but many people jump straight to Oracle, or SQL Server, which have expensive licenses and are often overkill for the problem at hand. Just think how much money you could save by putting off the purchase of that SQL Server license for a year or two.
Don't be scared into buying a full-featured enterprise level RDBMS at the beginning of your project; simple in-memory maps or data files will often suffice for a long time, and when they're no longer sufficient you'll know more about what you do need from your RDBMS. Maybe SQLite will do, or maybe it won't — in any case you've saved money.