It is usually better to use plain text files for data storage. This simple claim stirs up a lot of controversy, and there are some obvious cases where binary is best (think compressed data and picture/video/sound files), though even there you'll want to use open formats. See Chapter 5 of Eric S. Raymond's Art of Unix Programming, PmWiki author's Patrick Michaud arguments and associated discussion, or the DokuWiki page on the same topic. But for now let me add my two cents to the discussion.
The myth of database performance
Not long ago (ed: as of 10 November 2007) I had a lengthy argument with a fellow programmer who claimed that using flat-file storage would make a Web application such as Fcp.Commentator much slower than, say, a MySQL back-end. Yes, I've seen this opinion expressed before, and yes, I understand why he would think that, so I decided to explore the issue a little.
No, don't rush to make a benchmark. This is common sense stuff. Let's make an approximate list of the things MySQL has to do while processing a single page load on your garden-variety dynamic website:
- hit the system database once to check whether you're authorized to connect at all;
- hit the system database again to see if you have access rights to the database you asked for;
- hit the system database a third time (this is becoming a habit...) to see whether the tables/columns referred in your query actually exist, and whether you are authorized to access them;
- hit the tables involved (i.e. one big file apiece) and possibly several indices (even more files...) to fetch the data;
- (optionally) create and use several temporary files to summarize and sort the data before finally returning it to your script.
Did I mention parsing your query and making a query plan? Which, incidentally, involves yet more hits to the filesystem to check record counts and index availability? And don't even get me started about expensive algorithms or network communication overhead.
Now, do you honestly think the above sequence of operations can be faster than slurping in a dozen (or even a few dozen) text files from disk? Note, I'm not talking about data mining millions of records; I expect a relational database to be optimized for such applications. Certainly much more so than the naive code I'd write myself. But for simple operations such as those performed by Fcp.Commentator, using a RDBMS would be like taking a jet plane to the nearest grocery. For buying a banana.
The myth of database performance by Felix Pleşoianu is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.