Blog Archive for / 2007 /

Elegance in Software Part 2

Tuesday, 11 December 2007

In my earlier blog post on Elegance in Software I gave a list of things that I feel contribute to elegant code, and asked for input from my readers. This post is the promised follow-up.

Several respondents mentioned the book Beautiful Code, which is a collection of essays by "leading computer scientists" describing code they feel is beautiful, and why. I've only read an excerpt myself, but it's got good reviews, and what I've read has enticed me to read more. There's also a blog related to the book, which is well worth a read.

Another common theme was the "I know it when I see it" factor. Though I alluded to this in the introduction of my previous post by saying that "elegance is in the eye of the beholder", a lot of people felt this was far more important than any "tick list": there's something special about truly elegant code that transcends the details, just like really good art is more than just a collection of well-executed brush strokes that make up a well-chosen composition. I agree here: elegant code just says "ooh, that's good" when you read it, it has a "Quality without a Name".

Thomas Guest pointed out that appearance plays a part (whilst also discussing the importance of efficiency), and I agree. This ties in with good naming and short functions: if the code is poorly laid out, it's hard to argue that it's elegant. Yes, you can get a "source code beautifier" to physically rearrange the code, but good appearance often goes beyond that: if(some_boolean == true) is just not elegant, no matter how well-spaced it is. This also impacts the language used: it's harder to write "pretty" code in Perl than in Ruby or Scheme.

I particular liked Chris Dollin's characterization: it is obvious what elegant code does when you read it, but it's not necessarily an obvious approach when you haven't seen it before. This ties in with another theme amongst respondents: simplicity. Though I mentioned "minimal code" and "easy to understand" in my original list, "simplicity" goes beyond that, and I think that Chris's obvious solution to a complex problem highlights this. If the code is sufficiently easy to understand that a solution to a complex problem appears obvious, then it's probably a good demonstration of simplicity. Such code is "clever with a purpose" (as Pat Maddox described it).

Jim Shore has an interesting article on good design, in which he argues that the eye-of-the-beholder-ness of "elegant" is too vague for his liking, and instead tries to argue for "Quality with a Name". He says:

"A good software design minimizes the time required to create, modify, and maintain the software while achieving acceptable run-time performance."

Whilst this is definitely true, this ties in with the "tick list" from my previous posting. Elegant code is more than that, and I think this is important: software development is a craft, and developers are craftsmen. By taking pride in our work, by striving to write code that is not just good, but elegant, we are improving the state of our craft. Just as mathematicians strive for beautiful or elegant proofs, and are not satisfied with a proof by exhaustion, we should not be satisfied with code that is merely good, but strive for code that is elegant.

It is true that what I find to be elegant may be different from what you find to be elegant, but I hope believe that good programmers would agree that two pieces of code were "good code" even if they differ in their opinion of which is more elegant, much as art critics would agree that both a painting by Monet and one by Van Gogh were both good paintings, whilst differing in their opinion of which is better.

Posted by Anthony Williams
[/ design /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Testing Your Website in Multiple Browsers

Monday, 03 December 2007

When designing websites, it is very important to check the results in multiple web browsers — something that looks fine in Internet Explorer may look disastrous in Firefox, and vice-versa. This problem is due to the different way in which each web browser interprets the HTML, XHTML and CSS standards, combined with any bugs that may be present. If you're designing a website, you have no control over which browser people will use to view it, so you need to ensure that your website displays acceptably in as many different browsers as possible.

The only way to know for sure how a website looks in a particular browser is to try it out. If you don't check it, how do you know you won't hit a bug or other display quirk? However, given the plethora of web browsers and operating systems out there, testing in all of them is just not practical, so you need to choose a subset. The question is: which subset?

Popular browsers

Thankfully, most people use one of a few "popular" browsers, but that's still quite a few. In my experience, on Windows, the most popular browsers are Firefox, Internet Explorer and Opera, on Linux most people use Firefox, Mozilla or Netscape and on MacOS most people use Safari or Camino. Obviously, the relative proportions of users using each browser will vary depending on your website, and target niche — a website focused on non-technical users is far more likely to find users with Internet Explorer on Windows than anything else, whereas a website focused on linux kernel development will probably find the popular browser is Firefox on linux.

Which version?

It's all very well having identified a few popular browsers to use for testing, but an equally crucial aspect is which version of the browser to test. Users of Firefox, Opera, Mozilla, and recent versions of Netscape might be expected to upgrade frequently, whereas users of Internet Explorer might be far less likely to upgrade, especially if they are non technical (in which case they'll stick with the version that came with their PC). Checking the logs of some the websites I maintain shows that the vast majority of Firefox users (90+%) are using some variant of Firefox 2.0 (though there are a smattering all the way back to Firefox 0.5), whereas Internet Explorer users are divided between IE7 and IE6, with the ratio varying with the site.

Don't forget a text-only browser

A text only browser such as Lynx is ideal for seeing how your site will look to a search engine spider. Not only that, but certain screen reader applications will also give the same view to their users. Consequently, it's always worth checking with a text-only browser to ensure that your site is still usable without all the pretty visuals.

Multiple Browsers on the same machine

Having chosen your browsers and versions, the simplest way to test your sites is to install all the browsers on the same machine. That way, you can just open the windows side by side, and compare the results. Of course, you can't do this if the browsers run on different platforms, but one option there is to use virtual machines to test on multiple platforms with a single physical machine. Testing multiple versions of Internet Explorer can also be difficult, but TredoSoft have a nice little package called Multiple IEs which enables you to install multiple versions of Internet Explorer on the same PC. Thanks to Multiple IEs, on my Windows XP machine I've got IE3, IE4.01, IE5.01, IE5.5, IE6 and IE7, as well as Firefox, Opera, Safari and Lynx!

Snapshot services

If you don't fancy installing lots of browsers yourself, or you don't have access to the desired target platform, you can always use one of the online snapshot services such as browsershots (free) or browsercam (paid). These provide you with the ability to take a snapshot of your website, as seen in a long list of browsers on a long list of platforms. Browsercam also provides remote access to the testing machines, so you can interact with your sites and check dynamic aspects, such as Javascript — something that's becoming increasingly more important as AJAX becomes more prevalent.

Posted by Anthony Williams
[/ testing /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Testing on Multiple Platforms with VMWare

Tuesday, 27 November 2007

Whilst testing on multiple platforms is important, it can be difficult to obtain access to machines running all the platforms that you wish to test on. This is where virtualization software such as VMWare comes in handy: you don't need to have a separate machine for each tested platform — you don't even need a separate partition. Instead, you set up a Virtual Machine running the target platform, which runs on top of your existing OS. This Virtual Machine is completely self-contained, running off a virtual hard disk contained in a file on your real disk, and with a virtual screen which can be shown in a window on your host desktop.

Virtual Networks

This can be incredibly useful: not only can you test on multiple platforms without repartitioning your hard disk, but you can have multiple virtual machines running simultaneously. If you're developing an application that needs to run on multiple platforms, this can be invaluable, as you can see what the application looks like on different operating systems simultaneously. It also allows you to test network communication — each virtual machine is entirely independent of the others, so you can run a server application on one and a client application on another without having to build a physical network.

Get Started with a Pre-built Virtual Machine

VMWare have a repository of pre-built virtual machines, that they call "appliances". This makes it very easy to get started, without all the hassle of installing the OS. Some appliances even come with pre-installed applications — if you want to try a Ruby on Rails app on linux, then the UbuntuWebServer appliance might be a good place to start.

Warning: Virtual Machines use Real Resources

It's worth noting that the resource use (CPU, memory, disk space) is real, even if the machines are virtual — if you run a CPU-intensive application on your virtual machine, your system will slow down; if you give 3 virtual machines 1Gb of memory each but you only have 2Gb installed, you're going to see a lot of swapping. Virtual machines are not full-time replacements for physical ones unless you have a server with a lot of resources. That said, if you do have a server with a lot of resources, running separate systems and applications in separate virtual machines can make a lot of sense: the individual systems are completely isolated from one-another, so if one application crashes or destroys its (virtual) disk, the others are unaffected. Some web hosting companies use this facility to provide each customer with root access to their own virtual machine, for example.

It's also worth noting that if you install a non-free operating system such as Microsoft Windows, you still need a valid license.

Alternatives

VMWare Server is currently a free download for Windows and Linux, but it's not the only product out there. VirtualBox is also free, and runs on Windows, Linux and MacOSX. One nice feature that VirtualBox has is "seamless Windows": when running Microsoft Windows as the guest operating system, you can suppress the desktop background, so that the application windows from the virtual machine appear on the host desktop.

Another alternative is QEMU which offers full-blown emulation as well as virtualization. This allows you to experiment with operating systems running on a different CPU, though the emulated hardware can be quite limited.

Posted by Anthony Williams
[/ testing /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Database Tip: Use an Artificial Primary Key

Monday, 19 November 2007

If your data has a clear "master" field or combination of fields, which can uniquely identify each row (such as customer name in a table of customers or ISBN for a table of books), it is tempting to use that as the primary key of the table. However, my advice is: don't do that, use a separate, numeric, artificial primary key instead. Yes, it's an extra column on the table, requiring extra space, and you will have to generate it somehow, but that's not a big deal. Every database vendor provides some way of auto-generating unique key values (e.g. SEQUENCEs in Oracle, and AUTOINCREMENT fields in SQLite), so populating it is easy, and the complications it saves are more than worth the trade-off. You can still maintain the uniqueness of the master columns by applying a unique index to those columns.

Save Space on Foreign Keys

Firstly, if you have any tables that are associated with the master table, and therefore have foreign key columns that refer to the primary key of your master, then having a separate primary key can actually save space overall, as the data for the master columns doesn't have to be duplicated across all the linked tables. This is especially important if there is more than one "master column", such as customer_first_name and customer_last_name, or if the data for these columns is large.

Changing the master data

Secondly, if the "master columns" are actually the primary key of your table, changing the data in them is potentially problematic, especially if they are used as a foreign key in other tables. Many online services use a customer's email address as their "master column": each customer has one email address, and one email address refers to one customer. That's fine until a customer changes their email address. Obviously, you don't want to lose all data associated with a given customer just because they changed their email address, so you need to update the row rather than delete the old one and insert a new one. If the email_address column is the primary key of the table, and therefore used as the foreign key in other tables, then you've got to update the data not just in the master table, but in each dependent table too.

This is not impossible, but it's certainly more complex and time consuming. If you miss a table, the transaction may not complete due to foreign key constraint violations, or (worse) the transaction may complete, but some of the data may be orphaned. Also, in some database engines, the constraint violation will fire when you change either the master table or the dependent table, so you need to execute a special SQL statement to defer the constraint checking until COMMIT time. If you use an auto-generated primary key, then only the data in the master table needs changing.

Changing the master columns

Finally, if the primary key is auto-generated, then not only is it easy to change the data in the master columns, but you can actually change what the master columns are. For example, if you initially decide that customer_first_name and customer_last_name make an ideal primary key, then you're stuck if you then get another customer with the same name. OK, so you make customer_first_name, customer_last_name and customer_address the primary key. Oops — now you've got to duplicate the address information across all the dependent tables. Now you encounter two people with the same name at the same address (e.g. father and son), so you need to add a new designator to the key (e.g. Henry Jones Junior, Richard Wilkins III). Again, you need to update all the dependent tables. If the primary key is auto-generated, there's no problem — just update the unique constraint on the master table to include the appropriate columns, and all is well, with the minimum of fuss.

Simplify your code

It's not going to simplify your code much, but using an auto-generated numeric key means that this is all you need to store as an identifier inside your program to refer to a particular row: much easier than storing the data from a combination of columns. Also, it's much easier to write code to update the data on one table than on multiple tables.

Conclusion

Don't use real table data as the primary key for a table: instead, use a separate, numeric, auto-generated column as the primary key. This will simplify the connections between tables, and make your life easier if the structure of the database or the data in the key columns changes.

Related Posts

In previous posts on Database Design, I've talked about:

Posted by Anthony Williams
[/ database /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Elegance in Software

Monday, 12 November 2007

What does it mean for software to be elegant? When I write code, elegance is something I aspire to, and in some senses goes hand-in-hand with beautiful code, but that doesn't really make it any clearer. Certainly, I think there is a strong element of "elegance is in the eye of the beholder", but I think there are also some characteristics of the software that are contributory factors — how a particular person may rate the code on any aspect may vary, as may the importance they place on any given aspect, but these aspects will almost certainly impact how elegant the code appears.

Factors affecting the elegance of software

Here's a short list of some of the factors that I think are important. Obviously, this is not an exhaustive list, and all comments are my opinion, and not definitive.

Does it work?
I'd be hard-pushed to call software "elegant" if it didn't work
Is it easy to understand?
Lots of the following factors can really be summarised by this one: if I can't understand the code, it's not elegant.
Is it efficient?
A bubble sort is just not elegant, because there's lots of much more efficient algorithms. If a cunning algorithmic trick can drastically reduce the runtime, using that trick contributes to making the code elegant, especially if it is still easy to understand.
Short functions
Long functions make the code hard to follow. If I can't see the whole function on one screen in my editor, it's too long. Ideally, a function should be really short, less than 5 lines.
Good naming
Short functions are all very well, but if functions are called foo, abc, or wrt_lng_dt, it can still be hard to understand the code. Of course, this applies to classes just as much as functions.
Clear division of responsibility
It is important that it is clear which function or class is responsible for any given aspect of the design. Not only that, but a class or function should not have too many responsibilities — by the Single Responsibility Principle a class or function should have just one responsibility.
High cohesion
Cohesion is a measure of how closely related the data items and functions in a class or module are to each other. This is tightly tied in to division of responsibility — if a function is responsible for calculating primes and managing network connections, then it has low cohesion, and a poor division of responsibility.
Low coupling
Classes and modules should not have have unnecessary dependencies between them. If a change to the internals of one class or function requires a change to apparently unrelated code elsewhere, there is too much coupling. This is also related to the division of responsibility, and excessive coupling can be a sign that too many classes, modules or functions share a single responsibility.
Appropriate use of OO and other techniques
It is not always appropriate to encapsulate something in a class — sometimes a simple function will suffice, and sometimes other techniques are more appropriate. This is also related to the division of responsibilities, but it goes beyond that — is this code structure the most appropriate for handling this particular responsibility? Language idioms come into play here: is it more appropriate to use STL-style std::sort on an iterator interface, or does it make more sense to provide a sort member function? Can the algorithm be expressed in a functional way, or is an imperative style more appropriate?
Minimal code
Code should be short and to-the-point. Overly-long code can be the consequence of doing things at too low a level, and doing byte-shuffling rather than using a high-level sort algorithm. It can also be the consequence of too many levels of indirection — if a function does nothing except call one other function, it's getting in the way. Sometimes this can be at odds with good naming — a well-named function with a clear responsibility just happens to be able to delegate to a generic function, for example — but there's obviously a trade-off. Minimal code is also related to duplication — if two blocks of code do the same thing, one of them should be eliminated.

One thing that is not present in the above list is comments in the code. In my view, the presence of comments in the code implies that the code is not sufficiently clear. Yes, well-written comments can make it easier to understand a given block of code, but they should in general be unnecessary: truly elegant code can be understood without comments. Of course, you might need to understand what it is that the code is trying to accomplish before it makes complete sense, particularly if the code is using advanced algorithms, and comments can help with that (e.g. by providing a reference to the algorithm), but my general view is that comments are a sign of less-than-perfect code.

Let me know what you think constitutes elegant code.

Posted by Anthony Williams
[/ design /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Review of Patterns for Parallel Programming by Timothy G. Mattson, Beverly A. Sanders and Berna L. Massingill

Thursday, 01 November 2007

This book gives a broad overview of techniques for writing parallel programs. It is not an API reference, though it does have examples that use OpenMP, MPI and Java, and contains a brief overview of each in appendices. Instead, it covers the issues you have to think about whilst writing parallel programs, starting with identifying the exploitable concurrency in the application, and moving through techniques for structuring algorithms and data, and various synchronization techniques.

The authors do a thorough job of explaining the jargon surrounding parallel programming, such as what a NUMA machine is, what SPMD means, and what makes a program embarrassingly parallel. They also go into some of the more quantitative aspects, like calculating the efficiency of the parallel design, and the serial overhead.

Most of the content is structured in the form of Patterns (hence the title), which I found to be an unusual way of presenting the information. However, the writing is clear, and easily understood. The examples are well though out, and clearly demonstrate the points being made.

The three APIs used for the examples cover the major types of parallel programming environments — explicit threading (Java), message passing (MPI), and implicit threading from high-level constructs (OpenMP). Other threading environments generally fall into one of these categories, so it is usually straightforward to see how descriptions can be extended to other environments for parallel programming.

The authors are clearly coming from a high-performance computing background, with massively parallel computers, but HyperThreading and dual-core CPUs are becoming common on desktops, and many of the same issues apply when writing code to exploit the capabilities of these machines.

Highly Recommended. Everyone writing parallel or multi-threaded programs should read this book.

Buy this book

Patterns for Parallel Programming
Timothy G. Mattson, Beverly A. Sanders and Berna L. Massingill
Published by Addison-Wesley
ISBN 0-321-22811-1

Buy from Amazon.co.uk
Buy from Amazon.com

Posted by Anthony Williams
[/ reviews /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

10 Years of Programming with POSIX Threads

Monday, 29 October 2007

David Butenhof's Programming with POSIX Threads was published 10 years ago, in 1997. At the time, it was the definitive work on the POSIX thread API, and multi-threaded programming in general. Ten years is a long time in computing so how does it fare today?

New POSIX Standard

When the book was written, the latest version of the POSIX Standard was the 1996 edition (ISO/IEC 9945-1:1996). Since then, the standard has evolved. It is now maintained by a joint working group from The Open Group, the IEEE and ISO called The Austin Group. The new Standard is called the Single Unix Specification, Version 3 and the 2004 edition is available online.

The new standard has brought a few changes with it — many things that were part of extensions such as POSIX 1003.1j are now part of the main ISO Standard. This includes barriers and read-write locks, though barriers are still optional and the read-write locks have a slightly different interface. Programming with POSIX threads is therefore lacking a good description of the now-standard APIs — although Butenhof devotes a section in Chapter 7 to implementing read-write locks, this is now only of historical interest, as the semantics are different from those in the new standard.

Most things stay the same

Though there are inevitably some changes with the new standard, most of the APIs remain the same. Not only that, the fundamental concepts described in the book haven't changed — threads still work the same way, mutexes and condition variables still work the same way, and so forth. Not only that, but the rising numbers of multicore CPU desktop computers means that correct thread synchronization is more important than ever. Faulty assumptions about memory visibility that happened to be true for single core machines are often demonstrably false for multicore and multiprocessor machines, so the dangers of deadlock, livelock and race conditions are ever more present.

Still the definitive reference

Though it's probably worth downloading the new POSIX standard, or checking the man pages for the new functions, Programming with POSIX Threads is still a good reference to the POSIX thread APIs, and multi-threaded programming in general. It sits well alongside Patterns for Parallel Programming — whereas Patterns for Parallel Programming is mainly about designing programs for concurrency, Programming with POSIX Threads is very much focused on getting the implementation details right.

Highly Recommended.

Buy this book

Programming with POSIX Threads
David Butenhof
Published by Addison-Wesley
ISBN 0-201-63392-2

Buy from Amazon.co.uk
Buy from Amazon.com

Posted by Anthony Williams
[/ reviews /] permanent link
Tags: , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Using CSS to Replace Text with Images

Monday, 29 October 2007

Lots has been said about ways to replace text with images so that users with a graphical browser get a nice pretty logo, whilst search engines and screen readers get to see the text version. Most recently, Eric Enge has posted A Comprehensive Guide to Hidden Text & Search Engines over at SEOmoz. In general, I think it's a fair summary of the techniques I've encountered.

However, I was surprised to see the order of entries in the "may be OK" list. Firstly, I'd have expected sIFR to be top of the list — this is a widely used technique, and just replaces existing text with the same text in a different font. I prefer to do without Flash where possible, and this only works where you want to change the font rather than use a logo, but I can certainly see the draw here.

Secondly, I was also surprised to see the suggestion that is top of the list is to position the text off screen. I think this is a really bad idea, for accessibility reasons. When I only had a dial-up connection, I often used to browse with images turned off in order to reduce download times. If the text is positioned off screen, I would have just got a blank space. Even now, I often check websites with images turned off, because I think it is important. It is for this reason that my preferred technique is "Fahrner Image Replacement" (FIR). Whilst Eric says this is a no-no according to the Google Guidelines, I can't really see how — it's not deceptive in intent, and the text is seen by users without image support (or with images turned off) as well as the search engine bots. Also, given the quote from Susan Moskwa, it seems fine. Here's a quick summary of how it works:

Overlaying text with an image in CSS

The key to this technique is to have a nested SPAN with no content, position it over the text, and set a background image on it. If the background image loads, it hides the original text.

<h1 id="title"><span></span>Some Title Text</h1>

It is important to set the size of the enclosing tag to match the image, so that the hidden text doesn't leak out round the edges at large font sizes. The CSS is simple:

#title
{
    position: relative;
    width: 200px;
    height: 100px;
    margin: 0px;
    padding: 0px;
    overflow: hidden;
}

#title span
{
    position: absolute;
    top: 0px;
    left: 0px;
    width: 200px;
    height: 100px;
    background-image: url(/images/title-image.png);
}

This simple technique works in all the major browsers, including Internet Explorer, and gracefully degrades. Obviously, you can't select text from the image, but you can generally select the hidden text (though it's hard to see what you're doing), and copying the whole page will include the hidden text. Check it out — how does the title above ("Overlaying text with an image in CSS") appear in your browser?

Update: It has been pointed out in a comment on the linked SEOmoz article by bjornjohansen that you need to be aware of the potential for browsers with a different font size. This is definitely important — that's why we specify the exact dimensions for the enclosing element, and use overflow: hidden to avoid overhang. It's also important to ensure that the raw text (without the image) fits the specified space when rendered in at least one font size larger than "normal", so that people who use larger fonts can still read it with images disabled, without getting the text clipped.

Update: In another comment over on the SEOmoz article, MarioFr suggested that for headings the A tag could be used instead of SPAN — since empty A tags can be used as a link target in the heading, it works as a suitable replacement. I've changed the heading above to use an A tag for both purposes as an example.

Posted by Anthony Williams
[/ webdesign /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Review of Fit for Developing Software by Rick Mugridge and Ward Cunningham

Monday, 22 October 2007

As the subtitle of this book says, Fit is the Framework for Integrated Tests, which was originally written by Ward. This is a testing framework that allows tests to be written in the form of Excel spreadsheets or HTML tables, which makes it easy for non-programmers to write tests. This book is divided into several parts. Parts 1 and 2 give an in-depth overview of how to use Fit effectively, and how it enables non-programmers to specify the tests, whereas parts 3-5 provide details that programmers will need for how to set up their code to be run from Fit.

Though I have been aware of Fit for a long time, I have never entirely grasped how to use it; reading this book gave me a strong urge to give it a go. It is very clear, with plenty of examples. I thought the sections on good/bad test structure, and how to restructure your tests to be clearer and easy to maintain were especially valuable — though they are obviously focused on Fit, many of the suggestions are applicable to testing through any framework.

Fit was developed as a Java framework, and so all the programming examples are in Java. However, as stated in the appendix, there are ports for many languages including C#, Python and C++. The way of structuring the fixtures that link the Fit tests to the code under test varies with each language, but the overall principles still apply.

The book didn't quite succeed in convincing me to spend time working with Fit or Fitnesse to try and integrate it with any of my existing projects, but I still think it's worth a look, and will try and use it on my next greenfield project.

Recommended.

Buy this book

At Amazon.co.uk
At Amazon.com

Posted by Anthony Williams
[/ reviews /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Reduce Bandwidth Usage by Compressing Pages in PHP

Monday, 15 October 2007

In Reduce Bandwidth Usage by Supporting If-Modified-Since in PHP, I identified one way to reduce your bandwidth usage — use the appropriate HTTP headers to avoid sending content that hasn't changed. Another way to reduce your bandwidth usage is to compress your pages.

HTTP headers

The Accept-Encoding HTTP header is used by browsers to specify potential encodings for a requested web page. For Firefox, this is generally set to "gzip, deflate", meaning that the browser will accept (and decompress) web pages compressed with the gzip or deflate compression algorithms. The web server can then use the Content-Encoding header to indicate that it has used a particular encoding for the served page. The Vary header is used to tell the browser or proxy that different encodings can be used. For example, if the server compresses the page using gzip, then it will return headers that say

    Content-Encoding: gzip
    Vary: Accept-Encoding

Handling compression in PHP

For static pages, compression is handled by your web server (though you might have to configure it to do so). For pages generated with PHP you are in charge. However, supporting compression is really easy. Just add:

    ob_start('ob_gzhandler');

to the start of the script. It is important that this comes before any output has been written as in order to compress the output, all output has to be passed through the filter, and the headers have to be set. If any content has already been sent to the browser, then this won't work, which is why I put it at the start of the script — that way, there's not much chance of anything interfering.

Tags: , , , ,

Posted by Anthony Williams
[/ webdesign /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Reduce Bandwidth Usage by Supporting If-Modified-Since in PHP

Sunday, 30 September 2007

By default, pages generated with PHP are not cached by browsers or proxies, as they are generated anew every time the page is loaded by the server. If you have repeat visitors to your website, or even many visitors that use the same proxy, this means that a lot of bandwidth is wasted transferring content that hasn't changed since last time. By adding appropriate code to your PHP pages, you can allow your pages to be cached, and reduce the required bandwidth.

As Bruce Eckel points out in RSS: The Wrong Solution to a Broken Internet, this is a particular problem for RSS feeds — feed readers are often overly enthusiastic in their checking rate, and given the tendency of bloggers to provide full feeds this can lead to a lot of wasted bandwidth. By using the code from this article in your feed-generating code you can save yourself a whole lot of bandwidth.

Caching and HTTP headers

Whenever a page is requested by a browser, the server response includes a Last-Modified header in the response which indicates the last modification time. For static pages, this is the last modification time of the file, but for dynamic pages it typically defaults to the time the page was requested. Whenever a page is requested that has been seen before, browsers or proxies generally take the Last-Modified time from the cached version and populate an If-Modified-Since request header with it. If the page has not changed since then, the server should respond with a 304 response code to indicate that the cached version is still valid, rather than sending the page content again.

To handle this correctly for PHP pages requires two things:

  • Identifying the last modification time for the page, and
  • Checking the request headers for the If-Modified-Since.

Timestamps

There are two components to the last modification time: the date of the data used to generate the page, and the date of the script itself. Both are equally important, as we want the page to be updated when the data changes, and if the script has been changed the generated page may be different (for example, the layout could be different). My PHP code incorporates both by defaulting the modification time of the script, and allowing the user to pass in the data modification time, which is used if it is more recent than the script. The last modification time is then used to generate a Last-Modified header, and returned to the caller. Here is the function that adds the Last-Modified header. It uses both getlastmod() and filemtime(__FILE__) to determine the script modification time, on the assumption that this function is in a file included from the main script, and we want to detect changes to either.

function setLastModified($last_modified=NULL)
{
    $page_modified=getlastmod();
    
    if(empty($last_modified) || ($last_modified < $page_modified))
    {
        $last_modified=$page_modified;
    }
    $header_modified=filemtime(__FILE__);
    if($header_modified > $last_modified)
    {
        $last_modified=$header_modified;
    }
    header('Last-Modified: ' . date("r",$last_modified));
    return $last_modified;
}

Handling If-Modified-Since

If the If-Modified-Since request header is present, then it can be parsed to get a timestamp that can be compared against the modification time. If the modification time is older than the request time, a 304 response can be returned instead of generating the page.

In PHP, the HTTP request headers are generally stored in the $_SERVER superglobal with a name starting with HTTP_ based on the header name. For our purposes, we need the HTTP_IF_MODIFIED_SINCE entry, which corresponds to the If-Modified-Since header. We can check for this with array_key_exists, and parse the date with strtotime. There's a slight complication in that old browsers used to add additional data to this header, separated with a semicolon, so we need to strip that out (using preg_replace) before parsing. If the header is present, and the specified date is more recent than the last-modified time, we can just return the 304 response code and quit — no further output required. Here is the function that handles this:

function exitIfNotModifiedSince($last_modified)
{
    if(array_key_exists("HTTP_IF_MODIFIED_SINCE",$_SERVER))
    {
        $if_modified_since=strtotime(preg_replace('/;.*$/','',$_SERVER["HTTP_IF_MODIFIED_SINCE"]));
        if($if_modified_since >= $last_modified)
        {
            header("HTTP/1.0 304 Not Modified");
            exit();
        }
    }
}

Putting it all together

Using the two functions together is really simple:

     exitIfNotModifiedSince(setLastModified()); // for pages with no data-dependency
     exitIfNotModifiedSince(setLastModified($data_modification_time)); // for data-dependent pages

Of course, you can use the functions separately if that better suits your needs.

Posted by Anthony Williams
[/ webdesign /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Free SEO Tools for Webmasters

Monday, 24 September 2007

I thought I'd share some of the free online tools that I use for assisting with Search Engine Optimization.

The people behind the iWebTool Directory provide a set of free webmaster tools, including a Broken Link Checker, a Backlink Checker and their Rank Checker. For most tools, just enter your domain or URL in the box, click "Check!" and wait for the results.

Whereas the iWebTool tools each perform one small task, Website Grader is an all-in-one tool for grading your website. Put in your URL, the keywords you wish to rank well for, and the websites of your competitors (if you wish for a comparison). When you submit your site, the tool then displays its progress at the bottom of the page, and after a few moments will give a report on your website, including your PageRank, Alexa rank, inbound links and Google rankings for you and your competitors for the search terms you provided, as well as a quick analysis of the content of your page.

We Build Pages offers a suite of SEO tools, much like the ones from iWebTool. I find the Top Ten Analysis SEO Tool really useful, as it compares your site against the top ten ranking sites for the search term you specify. The Backlink and Anchor Text Tool is also pretty good — it takes a while, but eventually tells you which pages link to your site, and what anchor text they use for the link.

Posted by Anthony Williams
[/ webdesign /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Using Interfaces for Exception Safety in Delphi

Thursday, 20 September 2007

Resource Management and Exception Safety

One concept that has become increasingly important when writing C++ code is that of Exception Safety — writing code so that invariants are maintained even if an exception is thrown. Since exceptions are also supported in Delphi, it makes sense to apply many of the same techniques to Delphi code.

One of the important aspects of exception safety is resource management — ensuring that resources are correctly freed even in the presence of exceptions, in order to avoid memory leaks or leaking of other more expensive resources such as file handles or database connection handles. Probably the most common resource management idiom in C++ is Resource Acquisition is Initialization (RAII). As you may guess from the name, this involves acquiring resources in the constructor of an object. However, the important part is that the resource is released in the destructor. In C++, objects created on the stack are automatically destroyed when they go out of scope, so this idiom works well — if an exception is thrown, then the local objects are destroyed (and thus the resources they own are released) as part of stack unwinding.

In Delphi, things are not quite so straight-forward: variables of class type are not stack allocated, and must be explicitly constructed by calling the constructor, and explicitly destroyed — in this respect, they are very like raw pointers in C++. This commonly leads to lots of try-finally blocks to ensure that variables are correctly destroyed when they go out of scope.

Delphi Interfaces

However, there is one type of Delphi variable that is automatically destroyed when it goes out of scope — an interface variable. Delphi interfaces behave very much like reference-counted pointers (such as boost::shared_ptr) in this regard — largely because they are used to support COM, which requires this behaviour. When an object is assigned to an interface variable, the reference count is increased by one. When the interface variable goes out of scope, or is assigned a new value, the reference count is decreased by one, and the object is destroyed when the reference count reaches zero. So, if you declare an interface for your class and use that interface type exclusively, then you can avoid all these try-finally blocks. Consider:

type
    abc = class
        constructor Create;
        destructor Destroy; override;
        procedure do_stuff;
    end;

procedure other_stuff;

...

procedure foo;
var
    x,y: abc;
begin
    x := abc.Create;
    try
        y := abc.Create;
        try
            x.do_stuff;
            other_stuff;
            y.do_stuff;
        finally
            y.Free;
        end;
    finally
        x.Free;
    end;
end;

All that try-finally machinery can seriously impact the readability of the code, and is easy to forget. Compare it with:

type
    Idef = interface
        procedure do_stuff;
    end;
    def = class(TInterfacedObject, Idef)
        constructor Create;
        destructor Destroy; override;
        procedure do_stuff;
    end;

procedure other_stuff;

...

procedure foo;
var
    x,y: Idef;
begin
    x := def.Create;
    y := def.Create;
    x.do_stuff;
    other_stuff;
    y.do_stuff;
end;

Isn't the interface-based version easier to read? Not only that, but in many cases you no longer have to worry about lifetime issues of objects returned from functions — the compiler takes care of ensuring that the reference count is kept up-to-date and the object is destroyed when it is no longer used. Of course, you still need to make sure that the code behaves appropriately in the case of exceptions, but this little tool can go a long way towards that.

Further Benefits

Not only do you get the benefit of automatic destruction when you use an interface to manage the lifetime of your class object, but you also get further benefits in the form of code isolation. The class definition can be moved into the implementation section of a unit, so that other code that uses this unit isn't exposed to the implementation details in terms of private methods and data. Not only that, but if the private data is of a type not exposed in the interface, you might be able to move a unit from the uses clause of the interface section to the implementation section. The reduced dependencies can lead to shorter compile times.

Another property of using an interface is that you can now provide alternative implementations of this interface. This can be of great benefit when testing, since it allows you to substitute a dummy test-only implementation when testing other code that uses this interface. In particular, you can write test implementations that return fixed values, or record the method calls made and their parameters.

Downsides

The most obvious downside is the increased typing required for the interface definition — all the public properties and methods of the class have to be duplicated in the interface. This isn't a lot of typing, except for really big classes, but it does mean that there are two places to update if the method signatures change or a new method is added. In the majority of cases, I think this is outweighed by the benefit of isolation and separation of concerns achieved by using the interface.

Another downside is the requirement to derive from TInterfacedObject. Whilst you can implement the IInterface methods yourself, unless you have good reason to then it is strongly recommended to inherit from TInterfacedObject. One such "good reason" is that the class in question already inherits from another class, which doesn't derive from TInterfacedObject. In this case, you have no choice but to implement the functions yourself, which is tedious. One possibility is to create a data member rather than inherit from the problematic class, but that doesn't always make sense — you have to decide for yourself in each case. Sometimes the benefits of using an interface are not worth the effort.

As Sidu Ponnappa does in his post 'Programming to interfaces' strikes again, "programming to interfaces" doesn't mean to create an interface for every class, which does seem to be what I am proposing here. Whilst I agree with the idea, I think the benefits of using interfaces outweigh the downsides in many cases, for the reasons outlined above.

A Valuable Tool in the Toolbox

Whilst this is certainly not applicable in all cases, I have found it a useful tool when writing Delphi code, and will continue to use it where it helps simplify code. Though this article has focused on the exception safety aspect of using interfaces, I find the testing aspects particularly compelling when writing DUnit tests.

Posted by Anthony Williams
[/ delphi /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Intel and AMD Define Memory Ordering

Monday, 17 September 2007

For a long time, the ordering of memory accesses between processors in a multi-core or multi-processor system based on the Intel x86 architecture has been under specified. Many newsgroup posts have discussed the interpretation of the Intel and AMD software developer manuals, and how that translates to actual guarantees, but there has been nothing authoritative, despite comments from Intel engineers. This has now changed! Both Intel and AMD have now released documentation of their memory ordering guarantees — Intel has published a new white paper (Intel 64 Architecture Memory Ordering White Paper) devoted to the issue, whereas AMD have updated their programmer's manual (Section 7.2 of AMD64 Architecture Programmer's Manual Volume 2: System Programming Rev 3.13).

In particular, there are a couple of things that a now made explicitly clear by this documentation:

  • Stores from a single processor cannot be reordered, and
  • Memory accesses obey causal consistency, so
  • An aligned load is an acquire operation, and
  • An aligned store is a release operation, and
  • A locked instruction (such as xchg or lock cmpxchg) is both an acquire and a release operation.

This has implications for the implementation of threading primitives such as mutexes for IA-32 and Intel 64 architectures — in some cases the code can be simplified, where it has been written to take a pessimistic interpretation of the specifications.

Posted by Anthony Williams
[/ threading /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

New Papers for C++ Standards Committee

Tuesday, 11 September 2007

I've just added the most recent papers that I've submitted to the C++ Standards Committee to our publications page. Mostly these are on multi-threading in C++:

but there's also an updated to my old paper on Names, Linkage and Templates (Rev 2), with new proposed wording for the C++ Standard now that the Evolution Working Group have approved the proposal in principle, and it has move to the Core Working Group for final approval and incorporation into the Standard.

Posted by Anthony Williams
[/ news /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Database Tip: Use Parameterized Queries

Monday, 03 September 2007

This post is the third in a series of Database Tips.

When running an application with a database backend, a high percentage of SQL statements are likely to have variable data. This might be data obtained from a previous query, or it might be data entered by the user. In either case, you've got to somehow combine this variable data with the fixed SQL string.

String Concatenation

One possibility is just to incorporate the data into the SQL statement directly, using string concatenation, but this has two potential problems. Firstly, this means that the actual SQL statement parsed by the database is different every time. Many databases can skip parsing for repeated uses of the same SQL statement, so by using a different statement every time there is a performance hit. Secondly, and more importantly, this places the responsibility on you to ensure that the variable data will behave correctly as part of the statement. This is particularly important for web-based applications, as a common attack used by crackers is a "SQL injection" attack — by taking advantage of poor quoting by the application when generating SQL statements, it is possible to input data which will end the current SQL statement, and start a new one of the cracker's choosing. For example, if string data is just quoted in the SQL using plain quotes ('data') then data that contains a quote and a semicolon will end the statement. This means that if data is '; update login_table set password='abc'; then the initial '; will end the statement from the application, and the database will then run the next one, potentially setting everyone's password to "abc".

Parameterized Queries

A solution to both these problems can be found in the form of Parameterized Queries. In a parameterized query, the variable data in the SQL statement is replaced with a placeholder such as a question mark, which indicates to the database engine that this is a parameter. The database API can then be used to set the parameters before executing the query. This neatly solves the first problem with string concatenation — the query seen by the database engine is the same every time, so giving the database the opportunity to avoid parsing the statement every time. Most parameterized query APIs will also allow you to reuse the same query with multiple sets of parameters, thus explicitly caching the parsed query.

Parameterized queries also solve the SQL injection problem — most APIs can send the data directly to the database engine, marked as a parameter rather than quoting it. Even when the data is quoted within the API, this is then the database driver's responsibility, and is thus more likely to be reliable. In either case, the user is relinquished from the requirement of correctly quoting the data, thus avoiding SQL injection attacks.

A third benefit of parameterized queries is that data doesn't have to be converted to a string representation. This means that, for example, floating point numbers can be correctly transferred to the database without first converting to a potentially inaccurate string representation. It also means that the statement might run slightly faster, as the string representation of data often requires more storage than the data itself.

The Parameterized Query Coding Model

Whereas running a simple SQL statement consists of just two parts — execute the statement, optionally retrieve the results — using parameterized queries ofetn requires five:

  1. Parse the statement (often called preparing the statement.)
  2. Bind the parameter values to the parameters.
  3. Execute the statement.
  4. Optionally, retrieve the results
  5. Close or finalize the statement.

The details of each step depends on the particular database API, but most APIs follow the same outline. In particular, as mentioned above, most APIs allow you to run steps 2 to 4 several times before running step 5.

Placeholders

A parameterized query includes placeholders for the actual data to be passed in. In the simplest form, these placeholders can often be just a question mark (e.g. SELECT name FROM customers WHERE id=?), but most APIs also allow for named placeholders by prefixing an identifier with a marker such as a colon or an at-sign (e.g. INSERT INTO books (title,author) VALUES (@title,@author)). The use of named placeholders can be beneficial when the same data is needed in multiple parts of the query — rather than binding the data twice, you just use the same placeholder name. Named placeholders are also easier to get right in the face of SQL statements with large numbers of parameters or if the SQL statement is changed — it is much easier to ensure that the correct data is associated with a particular named parameter, than to ensure that it is associated with the correctly-numbered parameter, as it is easy to lose count of parameters, or change their order when changing the SQL statement.

Recommendations

Look up the API for parameterized queries for your database. In SQLite, it the APIs surrounding sqlite3_stmt, for MySQL it's the Prepared Statements API, and for Oracle the OCI parameterized statements API does the trick.

If your database API supports it, used named parameters, or at least explicit numbering (e.g. ?1,?2,?3 rather than just ?,?,?) to help avoid errors.

Posted by Anthony Williams
[/ database /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Interview with Nick Hodges from Codegear

Friday, 31 August 2007

Over at Channel 9, Michael Lehmann from Microsoft has posted a video in which he and Bob Walsh interview Nick Hodges, the Delphi programme manager from Codegear. They discuss the future of Delphi, including support for Windows Vista, .NET 2.0 and WPF, as well as what makes Delphi good for small software companies.

For those looking to get to grips with Delphi, Nick recommends www.delphibasics.co.uk and www.marcocantu.com as resources well worth investigating.

Posted by Anthony Williams
[/ delphi /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Database Tip: Use Transactions

Monday, 27 August 2007

Here's another Database Tip, to follow on from my previous one on creating appropriate indexes. This time the focus is transactions.

For any experienced database developer, using transactions might seem an obvious suggestion, but lightweight databases may require configuration in order to use transactions. For example, MySql tables use the MyISAM engine by default, which doesn't support transactions — in order to use transactions you need to set the storage engine of your tables to InnoDB or BDB. Also, whereas in Oracle, every statement occurs within a transaction, and you need an explicit COMMIT or ROLLBACK to end the transaction, in databases such as MySQL and SQLite every statement is its own transaction (and thus committed to the database immediately) unless you explicitly begin a transaction with a BEGIN statement, or other database configuration command.

Benefits of Transactions

The primary benefit of using transactions is data integrity. Many database uses require storing data to multiple tables, or multiple rows to the same table in order to maintain a consistent data set. Using transactions ensures that other connections to the same database see either all the updates or none of them. This also applies in the case of interrupted connections — if the power goes off in the middle of a transaction, the database engine will roll back the transaction so it is as-if it was never started. If each statement is committed independently, then other connections may see partial updates, and there is no opportunity for automatic rollback on error.

A secondary benefit of using transactions is speed. There is often an overhead associated with actually committing the data to the database. If you've got 1000 rows to insert, committing after every row can cause quite a performance hit compared to committing once after all the inserts. Of course, this can work the other way too — if you do too much work in a transaction then the database engine can consume lots of space storing the not-yet-committed data or caching data for use by other database connections in order to maintain consistency, which causes a performance hit. As with every optimisation, if you're changing the boundaries of your transactions to gain performance, then it is important to measure both before and after the change.

Using Transactions in Oracle

Oracle databases are always in transaction mode, so all that's needed is to decide where to put the COMMIT or ROLLBACK. When one transaction is finished, another is automatically started. There are some additional options that can be specified for advanced usage — see the Oracle documentation for these.

    INSERT INTO foo (a,b) VALUES (1,'hello');
    INSERT INTO foo (a,b) VALUES (2,'goodbye');
    COMMIT;
    INSERT INTO foo (a,b) VALUES (3,'banana');
    COMMIT;

Using Transactions in SQLite

In SQLite, if you wish a transaction to cover more than one statement, then you must use a BEGIN or BEGIN TRANSACTION statement. The transaction ends when you execute a COMMIT or ROLLBACK statement, and the database reverts to auto-commit mode where each statement has its own transaction.

    BEGIN;
    INSERT INTO foo (a,b) VALUES (1,'hello');
    INSERT INTO foo (a,b) VALUES (2,'goodbye');
    COMMIT;
    INSERT INTO foo (a,b) VALUES (3,'banana');
    -- implicit COMMIT at end of statement
    -- with no preceding BEGIN

Using Transactions in MySQL

As mentioned above, by default all tables use the MyISAM storage engine, so transaction support is disabled. By changing a table to use the InnoDB or BDB storage engines, transaction support can be enabled. For new tables, this is done using the ENGINE or TYPE parameters on the CREATE TABLE statement:

    CREATE TABLE foo
    (
        bar INTEGER PRIMARY KEY,
        baz VARCHAR(80) NOT NULL
    )
    ENGINE = InnoDB;

Existing tables can be changed to use the InnoDB storage engine using ALTER TABLE:

    ALTER TABLE foo ENGINE = InnoDB;

You can also change the default storage engine using the default-storage-engine server option on the server command line, or in the server configuration file.

Once all the tables you're using have storage engines that support transactions, you have two choices. For a given connection, you can set the AUTOCOMMIT session variable to 0, in which case every statement within that connection is part of a transaction, as for Oracle, or you can leave AUTOCOMMIT set to 1 and start transactions explicitly as for SQLite. In auto-commit mode for MySQL, transactions are started with BEGIN, BEGIN WORK or START TRANSACTION. To disable AUTOCOMMIT for the current transaction, use the SET statement:

    SET AUTOCOMMIT=0;

You can configure the database to run this statement immediately upon opening a connection using the init_connect server variable. This can be set in the configuration file, or using the following command:

    SET GLOBAL init_connect='SET AUTOCOMMIT=0';

MySQL also supports additional transaction options — check the documentation for details.

Automatic ROLLBACK and COMMIT

One thing to watch out for is code that causes an automatic ROLLBACK or COMMIT. Most databases cause an automatic ROLLBACK of any open transaction when a connection is closed, so it is important to make sure that all changes are committed before closing the connection.

Also worth watching out for are commands that cause an automatic COMMIT. The list varies depending on the database, but generally DDL statements such as CREATE TABLE will cause an automatic commit. It is probably best to avoid interleaving DML statement with any other type of statement in order to avoid surprises. Check your database documentation for details.

Posted by Anthony Williams
[/ database /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

V1.3 of the dbExpress drivers for MySQL V5.0 released

Friday, 24 August 2007

New this release:

  • Correctly stores BLOB data with embedded control characters
  • Correctly stores string data with embedded slashes
  • BCD fields in parameterized queries now work correctly with DecimalSeparators other than '.'
  • Time values stored and retrieved correctly
  • TSQLTable support works correctly with Delphi 7

See the download page for more details.

Posted by Anthony Williams
[/ delphi /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Database Tip: Create Appropriate Indexes

Monday, 20 August 2007

One of the simplest things you can do speed up database access is create appropriate indexes.

There are several aspects to this. Firstly, you need to identify which columns are used for queries on a given table; in particular, which columns appear in the WHERE clause of time consuming queries. If a query is only done once, or the table only has five rows in it so queries are always quick then there is no benefit to adding indexes. It's not just straight-forward SELECT statements that need checking — UPDATE and DELETE statements can have WHERE clauses too.

Having identified which columns are used in queries, it is important to also note which combinations are used. A database engine will tend to only use one index per table (though some can use more, depending on the query), so if your time-consuming queries use multiple columns from the same table in the WHERE clause, then it's probably better to have an index that covers all of them rather than separate indexes on each.

The cost of indexes

Adding an index to a table does have a downside. Any modification to a table with an index (such as inserts and deletes) will be slower, as the database will have to update the index, and the index will occupy space in the database. That can be a high price to pay for faster queries if the table is modified frequently. In particular, an index that is never used, or covers the same columns as another index is just dead weight. It is important to remember that PRIMARY KEY and UNIQUE columns automatically have indexes that cover them.

Timing is everything

As with every optimization, it is important to profile both before and after any change, and this includes checking the performance of the rest of the application too. Don't be afraid to remove an index if it isn't helping, and bear in mind that it's also possible to improve performance by rewriting queries, particularly where the are joins or subqueries involved.

Posted by Anthony Williams
[/ database /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Demonstrating Software on the Web

Friday, 10 August 2007

One of the techniques we use for collaborating with customers, and obtaining feedback on work in progress is to demonstrate the software over the internet. This means that customers can see the software in action, without having to install it on their systems. This can be beneficial when the software is not yet complete, and we wish to demonstrate how a particular feature works — many customers are reluctant to install unfinished software on their systems to try it out, and an online demonstration means they don't have to do this.

Online demonstrations also provide scope for faster feedback — it's much quicker to start up a web demo than it is to ship a new version of the software to a customer, wait for them to find time to install it, and then talk them through it. This means that changes can be demonstrated as soon as they are ready, and also alternate versions can be shown in the case that the choice is unclear.

For most demonstrations we use TightVNC. This allows the screen of our demonstration PC to be replicated across the web. All that is needed at the customer's site is a web browser with Java installed. We send our users a URL to go to which then connects them to our demonstration PC and loads the TightVNC Java applet. The display is updated in real-time. As an added bonus, the server can also be configured to allow the customers to control the demonstration machine from their end, giving them a chance to try out the software and see how they would use it. We also have the customers on the phone (usually using a speaker-phone) at the same time, so we can talk them through the software, or the changes that have been made.

Though not as ideal as a face-to-face meeting, such an online demonstration is considerably less expensive and time consuming for both parties, and can consequently be arranged far more often.

Posted by Anthony Williams
[/ feedback /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Delay Using a Database

Wednesday, 08 August 2007

A client of ours makes hardware that generates data files, and a few years ago I wrote a piece of software for them to help manage those data files. Initially it only had to deal with a few data files, so it indexed them on start up. Then someone tried to use it with several thousand data files, and start-up times got too slow, so I modified the indexing code to dump the current index to an XML file on shutdown, which it then loaded at startup. This has worked well, but now they're using it to handle hundreds of thousands of files, and the startup and shutdown times are again becoming significant due to the huge size of the XML file. Also, the data file access times are now getting larger due to the size of the in-memory index. We've now been hired again to address the issue, so this time I'm going to use a SQLite database for the index — no start up delay, no shutdown delay, and faster index lookup.

What lessons can be learned from this experience? Should I have gone for SQLite in the first instance? I don't think so. Using a simple in-memory map for the initial index was the simplest thing that could possibly work, and it has worked for a few years. The XML index file was a small change, and it kept the application working for longer. Now the application does need a database, but the implementation is certainly more complex than the initial in-memory map. By using the simple implementation first, the application was completed quicker — not only did this save my client money in the development, but it meant they could begin using it sooner. It also meant that now I come to add the database code, the requirements are better-known and there are already a whole suite of tests for how the index should behave. It has taken less than a day to write the database indexing code, whereas it could easily have taken several days at the beginning.

I think people are often too keen to jump straight to using a database, when they could often get by for now with something far simpler. That doesn't mean that requirements won't evolve, and that a database won't be required in the end, but this time can often be put off for years, thus saving time and money. In this instance I happened to use SQLite, which is free, but many people jump straight to Oracle, or SQL Server, which have expensive licenses and are often overkill for the problem at hand. Just think how much money you could save by putting off the purchase of that SQL Server license for a year or two.

Don't be scared into buying a full-featured enterprise level RDBMS at the beginning of your project; simple in-memory maps or data files will often suffice for a long time, and when they're no longer sufficient you'll know more about what you do need from your RDBMS. Maybe SQLite will do, or maybe it won't — in any case you've saved money.

Posted by Anthony Williams
[/ database /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

The C++ Performance TR is now publicly available

Wednesday, 08 August 2007

The C++ Performance TR is a Technical Report issued by the C++ Standards committee detailing various factors that affect the performance of a program written in C++.

This includes information on various strategies of implementing aspects of the language, along with their consequences for executable size and timing, as well as suggestions on how to write efficient code. It also includes information on use of C++ in embedded systems, including a discussion of putting constant data in ROM and direct access to hardware registers.

Whether you're a compiler writer, library writer or application developer, this is well worth a look. Download a copy from the ISO website today.

Posted by Anthony Williams
[/ cplusplus /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Implementing Synchronization Primitives for Boost on Windows Platforms

Wednesday, 11 July 2007

My article, Implementing Synchronization Primitives for Boost on Windows Platforms from the April 2007 issue of Overload is now available online.

In the article, I describe a the implementation of a new mutex type for Windows platforms, for the Boost Threads library.

Posted by Anthony Williams
[/ news /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Chaining Constructors in C++

Monday, 18 June 2007

Chain Constructors is one of the refactorings from Refactoring to Patterns (page 340) designed to reduce duplication — in this case duplication between constructors. Unfortunately, it is not such a straight-forward refactoring in C++, since in the current Standard it is not possible for one constructor to delegate to another.

The proposal to the C++ committee to support this feature in the next C++ Standard has been accepted, but the next Standard won't be ready until 2009, with implementations available sometime after that. If you've got a problem in your current project for which this is an appropriate refactoring then two years or more is a bit long too wait. So, with that in mind, I'm posting my work-around here for those that would like this feature now.

Adding a layer of redirection

All problems in Software can be solved by adding a layer of redirection, and this is no exception. In this case, we add a level of redirection between the class and its data by wrapping the data in a private struct. Every constructor of the original class then delegates to the constructor(s) of the internal struct. I'll illustrate with one of the examples from the Standardization proposal:

class X
{
    struct internal
    {
        internal( int, W& );
        ~internal();
        Y y_;
        Z z_;
    } self;
public:
    X();
    X( int );
    X( W& );
};

X::internal::internal( int i, W& e ):
    y_(i), z_(e)
{
    /*Common Init*/
}

X::X():
    self( 42, 3.14 )
{
    SomePostInitialization();
}

X::X( int i ):
    self( i, 3.14 )
{
    OtherPostInitialization();
}

X::X( W& w ):
    self( 53, w )
{ /* no post-init */ }

X x( 21 ); // if the construction of y_ or z_ throws, internal::~internal is invoked

Every constructor of class X has to initialize the sole data member self, the constructor of which encapsulates all the common initialization. Each delegating constructor is then free to do any additional initialization required.

Within the member functions of X, all references to member data now have to be prefixed with self., but that's not too bad a price — it makes it clear that this is member data, and is analagous to the use of this->, or the m_ prefix.

This simple solution only provides for a single layer of delegation — multiple layers of delegation would require multiple layers of nested structs, but it does provide full support at that level.

pimpls and Compilation Firewalls

Once the data has been encapsulated in a private structure, a further step worth considering is a move to the use of a pointer to the internal structure, also known as the pimpl idiom, or the use of a compilation firewall. By so doing, all that is required in the class definition is a forward declaration of the internal class, rather than a full definition. The full definition is then provided in the implementation file for the enclosing class. This eliminates any dependencies on the internal data from other classes, at the cost of forcing the data to be heap allocated. It also removes the possibility of any operations on the enclosing class being inline. For further discussion on the pimpl idiom, see Herb Sutter's Guru of the Week entry.

Refactoring steps

Here's a quick summary of the steps needed to perform this refactoring:

  1. Create a private struct named internal in the class X being refactored with an identical set of data members to class X.
  2. Create a data member in class X of type internal named self, and remove all other data members.
  3. For each constructor of X, write a constructor of internal that mirrors the member-initializer list, and replace the member initializer list of that constructor with a single initialization of self that forwards the appropriate constructor parameters.
  4. Replace every reference to a data member y of class X to a reference to self.y.
  5. Eliminate duplication.

Posted by Anthony Williams
[/ cplusplus /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Plan-it Earth Website goes live

Tuesday, 20 March 2007

We've just completed the website for Plan-it Earth. They offer Yurt Holidays and family Eco Camps on their traditional Cornish smallholding, which is just a few miles from us. We have been working closely with them to develop a new website from scratch, and have thoroughly enjoyed the experience. We are passionate about Cornwall, and West Penwith in particular (hence our location), and about reducing our environmental impact, so it was wonderful to work on a website with people who were similarly passionate, and where the aim is to spread this enthusiasm.

Posted by Anthony Williams
[/ news /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Previous Entries Later Entries