Building on a Legacy

by Anthony Williams

The term "Legacy Code" has been imbued with meaning by the software development community, and evokes images of masses of tangled, hard-to-change code, with bad variable names, 1000 line functions, misleading comments, and unfathomable dependencies.

Taken literally, it means code "handed down by an ancestor or predecessor" [1], which doesn't sound so scary, and it needn't be, provided suitable care is taken. I've found the following techniques to be useful when I've had legacy code to work with, and I hope they can be useful for you too.

What, no documentation?

Many developers bemoan the lack of documentation when presented with some legacy code to maintain, but this often belies their real concerns — they find the code hard to understand, and hard to change, without introducing bugs.

Michael Feathers asserts [2] that the best way to maintain legacy code is to get it under test, so you don't have to worry about breaking it as you make changes, and I'm inclined to agree. If you haven't got a copy of Michael's book, I really recommend you buy one.

Bearing in mind the problems of maintaining hard-to-understand code, it's well worth taking the time to refactor, to make it clearer. I've written about what I consider makes for maintainable code before [3], and you should bear this in mind when making changes. However, it is important not to get carried away, and try and rewrite everything — that way lies stress, as you're not making any progress in the mean time.

Baby Steps

The key to making any changes to legacy code, whether in order to add features, or to get it under test, is to make a series of small changes, rather than one big one, and then verify that everything works after each small change. Obviously, the best way to verify that nothing has broken is with a set of automated tests, which is Catch-22 if you need to make the changes in order to add the tests — in this case, you have to rely on making the smallest changes you can in order to add the tests, and manual testing to verify the behaviour.

Stay focused

When faced with a big ball of mud, it's tempting to dive in and refactor like crazy, tidying up the code all over the place. This is not a good idea, since you're not adding new functionality whilst you're doing this, and you are modifying this code for a reason — you've got a bug to fix, or a new feature to add.

Instead, the way to deal with the mess is to focus on the area that needs to be changed. If you're fixing a bug, hunt it down ruthlessly, then add tests to that specific area, so that (a) you'll know when the bug is fixed, since you've got a test case that traps it, and (b) you'll know that you haven't broken any of the desired behaviour in that area. Once the tests are in place, tidy up this corner of the code-base — split that 1000 line function, by extracting small, self-contained functions with good names, and clear responsibilities; rename a few variables; group related data into structures and classes.

If you're adding a new feature, work in a similar way — find the parts of the code that need to change in order to add the new feature, add tests to the existing code, write tests for the new feature, and add the new code. Again, once the tests are in place, tidy up the affected areas of the code-base. Sometimes it is best to do this before adding the new code, in order to make it just that little bit easier to add the new feature, and sometimes it is a good idea to do it afterwards, to eliminate some duplication, and make it easier to change next time.

Pay back technical debt

Unmaintainable code-bases, with a lack of tests, are often said to have accrued a lot of "Technical debt". The consequence of accruing the technical debt, and not keeping the code clean and well-tested is that you have to pay "interest", in that adding new features and fixing bugs takes longer.

By adding tests, and refactoring to improve the design, as you fix bugs and add new features, you are paying back some of this debt. The code should be cleaner after your modification than before, with less duplication, and more tests. Working this way, the areas of code that change frequently will become well-covered with tests, and will gradually become better designed over time. Adding new features will therefore get easier as time goes on, rather than harder.

The areas of code you haven't had reason to change will remain just as untidy as before, but if you don't need to change them, this isn't a problem. Also, if you're fixing all the critical bugs you know of, and you still don't need to change a bit of code, then either it's never run, or it works as intended, however unclear it may be.

Delete, delete, delete

No code has fewer bugs than no code. If some of the code is not used, delete it. Some people have a fear of deleting unused code, in case they need it sometime, but this fear is unnecessary — if you're using a version control system, then the code will be there if you need it. Tag the last version before you delete it, to make it easy to find again, if you wish, but do delete it from the current version of the code. Unused code just makes it harder to understand what the rest of the code does.

Whilst unused classes and functions cause clutter, the worst offender in the unused code stakes is an unused branch of a function that is used. Every time you have to read the function to try and understand it, this code gets in the way. Once you've worked out that it cannot be called, delete it, and save yourself from having to work it out every time you look at the function.

Code coverage tools will help with this analysis, but they require that every code path actually used is exercised, which requires extensive testing, whether with automated or manual tests. Sometimes static analysers will be able to identify unused code, and sometimes you can tell just by looking, e.g.

    n=3;
    if(n==2){ ... }

Sometimes it's worth using a cross-reference tool, or even plain grep, to find whether a function is called from anywhere. If it's not called, delete it.

Version Control

I said deleting code is safe because it'll still be in your version control history; this rather presumes that you're using a version control system. If you're not, start now — download CVS or Subversion, and set up a repository for your source code.

Assuming you do have a version control system, make sure you use it to full effect — check in code frequently, and label important versions, such as before a refactoring, or when you make a release. Every time you make a small change to your code, and everything still works, check it in. After each baby step, check the code in; check your code in many times a day, sometimes after only a few minutes. When you're making changes to legacy code, you'll be glad of the safety net, knowing that you've got working code to fall back to, from just a short time ago.

Conclusion

Legacy code needn't be scary, but it does require careful handling. Stay focused, take things slowly, and work step by step, with frequent check points. Add tests as you go, and pay back a little technical debt every time you work on an area.

We should all bear in mind the problems of legacy code when we're developing, and do our best to avoid them before they're an issue. We should strive to leave a legacy we can be proud of.

Footnotes:

[1]: OED, meaning 5b for Legacy, n.
[2]: Working Effectively with Legacy Code, Michael Feathers, published by Prentice Hall PTR, 2005.
[3]: Writing Maintainable Code, Anthony Williams, C Vu 16.2, April 2004.

Just Software Solutions

About Us

Technical Writings

Subscribe to Blog