Blog Archive for / design /

Duplication in Software

Tuesday, 26 March 2013

Much has been said about the importance of reducing duplication in software. For example, J. B. Rainsberger has "minimizes duplication" as the second of his four "Elements of Simple Design", and lots of the teachings of the Agile community stress the importance of reducing duplication when refactoring code.

Inspired by Kevlin Henney's tweet last week, where he laments that programmers trying to remove duplication often take it literally, I wanted to talk about the different kinds of duplication in software. I've just mentioned "literal" duplication, so let's start with that.

Basic Literal Duplication

This is the most obvious form of duplication: sections of code which are completely identical. This most often arises due to copy-and-paste programming, but can often arise in the form of repetitive patterns — a simple for loop that is repeated multiple places with the same body, for example.

Removing Literal Duplication

The easiest to create, literal duplication is also the easiest to remove: just extract a function that does the necessary operation.

Sometimes, though the code is identical, the types involved are different. You cannot address this with extracting a simple function, so we have a new class of duplication.

Parametric Literal Duplication

Parametric literal duplication can also arise from copy-and-paste programming. The key feature is that the types of the variables are different so you cannot just reuse the code from one place in another, even if it was a nicely self-contained function. If you eliminate all the basic literal duplication, parametric literal duplication will give you sets of functions with identical structure but different types.

With the lack of a portable is_ready() function for std::future, it is common to test whether a future f is ready by writing f.wait_for(std::chrono::seconds(0))==std::future_status::ready. Since std::future is a class template, the types of the various futures that you may wish to check for readiness may vary, so you cannot extract a simple function. If you write this in multiple places you therefore have parametric literal duplication.

Removing Parametric Literal Duplication

There are various ways to remove parametric literal duplication. In C++ the most straightforward is probably to use a template. e.g.

template<typename T>
inline bool is_ready(std::future<T> f){
    return f.wait_for(std::chrono::seconds(0))==std::future_status::ready;
}

In other languages you might choose to use generics, or rely on duck-typing. You might also do it by extracting an interface and using virtual function calls, but that requires that you can modify the types of the objects, or are willing to write a facade.

Parametric literal duplication is closely related to what I call Structural Duplication.

Structural Duplication

This is where the overall pattern of some code is the same, but the details differ. For example, a for loop that iterates over a container is a common structure, but the loop body varies from loop to loop.e.g

std::vector<int> v;

int sum=0;
for(std::vector<int>::iterator it=v.begin();it!=v.end();++it){
    sum+=*it;
}
for(std::vector<int>::iterator it=v.begin();it!=v.end();++it){
    std::cout<<*it<<std::endl;
}

You can't just extract the whole loop into a separate function because the loop body is different, but that doesn't mean you can't do anything about it.

Removing Structural Duplication

One common way to remove such duplication is to extract the commonality with the template method pattern, or create a parameterized function where the details are passed in as a function to call.

For simple loops like the ones above, we have std::for_each, and the new-style C++11 for loops:

std::for_each(v.begin(),v.end(),[&](int x){sum+=x;});
std::for_each(v.begin(),v.end(),[](int x){std::cout<<x<<std::endl;});

for(int x:v){
    sum+=x;
}
for(int x:v){
    std::cout<<x<<std::endl;
}

Obviously, if your repeated structure doesn't match the standard library algorithms then you must write your own, but the idea is the same: take a function parameter which is a callable object and which captures the variable part of the structure. For a loop, this is the loop body. For a sort algorithm it is the comparison, and so forth.

Temporal Duplication

This is where some code only appears once in the source code, but is executed repeatedly, and the only desired outcome is the computed result, which is the same for each invocation. For example, the call to v.size() or v.end() to find the upper bound of an iteration through a container.

std::vector<int> v;
for(unsigned i=0;i<v.size();++i)
{
    do_stuff(v[i]);
}

It doesn't just happen in loops, though. For example, in a function that inserts data into a database table you might build a query object, run it to insert the data, and then destroy it. If this function is called repeatedly then you are repeatedly building the query object and destroying it. If your database library supports parameterization then you may well be able to avoid this duplication.

Removing Temoral Duplication

The general process for removing temporal duplication is to use some form of caching or memoization — the value is computed once and then stored, and this stored value is used in place of the computation for each subsequent use. For loops, this can be as simple as extracting a variable to hold the value:

for(unsigned i=0,end=v.size();i!=end;++i){
    do_stuff(v[i]);
}

For other things it can be more complex. For example, with the database query example above, you may need to switch to using a parameterized query so that on each invocation you can bind the new values to the query parameters, rather than building the query around the specific parameters to insert.

Duplication of Intent

Sometimes the duplication does not appear in the actual code, but in what the code is trying to achieve. This often occurs in large projects where multiple people have worked on the code base. One person writes some code to do something in one source file, and another writes some code to do the same thing in another source file, but different styles mean that the code is different even though the result is the same. This can also happen with a single developer if the different bits are written with a large enough gap, such that you cannot remember what you did before and your style has changes slightly. To beat the loop iteration example to death, you might have some code that loops through a container by index, and other code that loops through the same container using iterators. The structure is different, but the intent is the same.

Removing Duplication of Intent

This is one of the hardest types of duplication to spot and remove. The way to remove it is to refactor one or both of the pieces of code until they have the same structure, and are thus more obviously duplicates of one-another. You can then treat them either as literal duplication, parametric literal duplication or structural duplication as appropriate.

Incidental Duplication

This is where there is code that looks identical but has completely a different meaning in each place. The most obvious form of this is with "magic numbers" — the constant "3" in one place typically has a completely different meaning to the constant "3" somewhere else.

Removing Incidental Duplication

You can't necessarily entirely eliminate incidental duplication, but you can minimize it by good naming. By using symbolic constants instead of literals then it is clear that different uses are distinct because the name of the constant is distinct. There will be still be duplication of the literal in the definition of the constants, but this is now less problematic.

In the case that this incidental duplication is not just a constant then you can extract separate named functions that encapsulate this duplicate code, and express the intent in each case. The duplication is now just between these function bodies than between the uses, and the naming of the functions makes it clear that this is just incidental duplication.

Conclusion

There are quite a few types of duplication that you may get in your code. By eliminating them you will tend to make your code shorter, clearer, and easier to maintain.

If you can think of any types of duplication I've missed, please add a comment.

Posted by Anthony Williams
[/ design /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

The Software Craftsmanship Manifesto

Wednesday, 11 March 2009

Do you care about the quality of your work as a software developer? Do you strive to produce the best software you can for your clients or employer? I don't mean basic level "does it work?" kind of quality — I hope we all aim to produce code that works. Does it matter to you if the code is well-crafted? Do you strive to write elegant software? Do you actively work to improve your skills as a developer?

There's been a lot of discussion about software quality on the internet recently. Uncle Bob, Joel Spolsky and Jeff Atwood got involved in the "Quality doesn't matter" debate, culminating in Uncle Bob talking on Jeff and Joel's Stack Overflow Podcast. James Bach even went as far as to hypothesise that Quality is Dead.

James has a point: in many instances it seems that people are quite happy to tolerate buggy software that's "good enough", and that developers are quite happy to ship such software. We're not perfect, and we will write code with bugs in, but to a large extent it's the attitude that counts. Whilst I accept that there may well be bugs in my code, I strive to avoid them, work hard to fix any that are found, and try and learn ways of reducing their occurrence in future. I also feel that software should be well-crafted so that it doesn't just work now, but will continue to work as it evolves, and such evolution should be as easy as possible. Of course, there's more to software quality than that — quality is Value to Some Person, and your job as a software developer is to ensure that your clients, customers or employers get the things that they value from the software you develop.

If this is something you feel strongly about, rest assured that you're not alone — there are many others who feel that Quality is Alive, to the extent that a few developers have got together to draft a Manifesto for Software Craftsmanship. The manifesto has over 1500 signatures (including mine) — why not add yours?

Posted by Anthony Williams
[/ design /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Exceptions make for Elegant Code

Friday, 06 June 2008

On this week's Stack Overflow podcast, Joel comes out quite strongly against exceptions, on the basis that they are hidden flow paths. Whilst I can sympathise with the idea of making every possible control path in a routine explicitly visible, having just had to write some C code for a recent project I would really like to say that this actually makes the code a lot harder to follow, as the actual code for what it's really doing is hidden amongst a load of error checking.

Whether or not you use exceptions, you have the same number of possible flow paths. With exceptions, the code can be a lot cleaner than with exceptions, as you don't have to write a check after every function call to verify that it did indeed succeed, and you can now proceed with the rest of the function. Instead, the code tells you when it's gone wrong by throwing an exception.

Exceptions also simplify the function signature: rather than having to add an additional parameter to hold the potential error code, or to hold the function result (because the return value is used for the error code), exceptions allow the function signature to specify exactly what is appropriate for the task at hand, with errors being reported "out-of-band". Yes, some functions use errno, which helps by providing a similar out-of-band error channel, but it's not a panacea: you have to check and clear it between every call, otherwise you might be passing invalid data into subsequent functions. Also, it requires that you have a value you can use for the return type in the case that an error occurs. With exceptions you don't have to worry about either of these, as they interrupt the code at the point of the error, and you don't have to supply a return value.

Here's three implementations of the same function using error code returns, errno and exceptions:

    int foo_with_error_codes(some_type param1,other_type param2,result_type* result)
    {
        int error=0;
        intermediate_type temp;

        if((error=do_blah(param1,23,&temp)) ||
           (error=do_flibble(param2,temp,result))
        {
            return error;
        }
        return 0;
    }

    result_type foo_with_errno(some_type param1,other_type param2)
    {
        errno=0;
        intermediate_type temp=do_blah(param1,23);
        if(errno)
        {
            return dummy_result_type_value;
        }

        return do_flibble(param2,temp);
    }

    result_type foo_with_exceptions(some_type param1,other_type param2)
    {
        return do_flibble(param2,do_blah(param1,23));
    }

Error Recovery

In all three cases, I've assumed that there's no recovery required if do_blah succeeds but do_flibble fails. If recovery was required, additional code would be required. It could be argued that this is where the problems with exceptions begin, as the code paths for exceptions are hidden, and it is therefore unclear where the cleanup must be done. However, if you design your code with exceptions in mind I find you still get elegant code. try/catch blocks are ugly: this is where deterministic destruction comes into its own. By encapsulating resources, and performing changes in an exception-safe manner, you end up with elegant code that behaves gracefully in the face of exceptions, without cluttering the "happy path". Here's some code:

    int foo_with_error_codes(some_type param1,other_type param2,result_type* result)
    {
        int error=0;
        intermediate_type temp;

        if(error=do_blah(param1,23,&temp))
        {
            return error;
        }

        if(error=do_flibble(param2,temp,result))
        {
            cleanup_blah(temp);
            return error;
        }
        return 0;
    }

    result_type foo_with_errno(some_type param1,other_type param2)
    {
        errno=0;
        intermediate_type temp=do_blah(param1,23);
        if(errno)
        {
            return dummy_result_type_value;
        }

        result_type res=do_flibble(param2,temp);
        if(errno)
        {
            cleanup_blah(temp);
            return dummy_result_type_value;
        }
        return res;
    }

    result_type foo_with_exceptions(some_type param1,other_type param2)
    {
        return do_flibble(param2,do_blah(param1,23));
    }

    result_type foo_with_exceptions2(some_type param1,other_type param2)
    {
        blah_cleanup_guard temp(do_blah(param1,23));
        result_type res=do_flibble(param2,temp);
        temp.dismiss();
        return res;
    }

In the error code cases, we need to explicitly cleanup on error, by calling cleanup_blah. In the exception case we've got two possibilities, depending on how your code is structured. In foo_with_exceptions, everything is just handled directly: if do_flibble doesn't take ownership of the intermediate data, it cleans itself up. This might well be the case if do_blah returns a type that handles its own resources, such as std::string or boost::shared_ptr. If explicit cleanup might be required, we can write a resource management class such as blah_cleanup_guard used by foo_with_exceptions2, which takes ownership of the effects of do_blah, and calls cleanup_blah in the destructor unless we call dismiss to indicate that everything is going OK.

Real Examples

That's enough waffling about made up examples, let's look at some real code. Here's something simple: adding a new value to a dynamic array of DataType objects held in a simple dynamic_array class. Let's assume that objects of DataType can somehow fail to be copied: maybe they allocate memory internally, which may therefore fail. We'll also use a really dumb algorithm that reallocates every time a new element is added. This is not for any reason other than it simplifies the code: we don't need to check whether or not reallocation is needed.

If we're using exceptions, that failure will manifest as an exception, and our code looks like this:

class DataType
{
public:
    DataType(const DataType& other);
};

class dynamic_array
{
private:
    class heap_data_holder
    {
        DataType* data;
        unsigned initialized_count;

    public:
        heap_data_holder():
            data(0),initialized_count(0)
        {}
        explicit heap_data_holder(unsigned max_count):
            data((DataType*)malloc(max_count*sizeof(DataType))),
            initialized_count(0)
        {
            if(!data)
            {
                throw std::bad_alloc();
            }
        }
        void append_copy(DataType const& value)
        {
            new (data+initialized_count) DataType(value);
            ++initialized_count; 
        }
        void swap(heap_data_holder& other)
        {
            std::swap(data,other.data);
            std::swap(initialized_count,other.initialized_count);
        }
        unsigned get_count() const
        {
            return initialized_count;
        }
        ~heap_data_holder()
        {
            for(unsigned i=0;i<initialized_count;++i)
            {
                data[i].~DataType();
            }
            free(data);
        }
        DataType& operator[](unsigned index)
        {
            return data[index];
        }
        
    };

    heap_data_holder data;

    // no copying for now
    dynamic_array& operator=(dynamic_array& other);
    dynamic_array(dynamic_array& other);
public:
    dynamic_array()
    {}
    void add_element(DataType const& new_value)
    {
        heap_data_holder new_data(data.get_count()+1);
        for(unsigned i=0;i<data.get_count();++i)
        {
            new_data.append_copy(data[i]);
        }
        new_data.append_copy(new_value);
        new_data.swap(data);
    }
};

On the other, if we can't use exceptions, the code looks like this:

class DataType
{
public:
    DataType(const DataType& other);
    int get_error();
};

class dynamic_array
{
private:
    class heap_data_holder
    {
        DataType* data;
        unsigned initialized_count;
        int error_code;

    public:
        heap_data_holder():
            data(0),initialized_count(0),error_code(0)
        {}
        explicit heap_data_holder(unsigned max_count):
            data((DataType*)malloc(max_count*sizeof(DataType))),
            initialized_count(0),
            error_code(0)
        {
            if(!data)
            {
                error_code=out_of_memory;
            }
        }
        int get_error() const
        {
            return error_code;
        }
        int append_copy(DataType const& value)
        {
            new (data+initialized_count) DataType(value);
            if(data[initialized_count].get_error())
            {
                int const error=data[initialized_count].get_error();
                data[initialized_count].~DataType();
                return error;
            }
            ++initialized_count;
            return 0;
        }
        void swap(heap_data_holder& other)
        {
            std::swap(data,other.data);
            std::swap(initialized_count,other.initialized_count);
        }
        unsigned get_count() const
        {
            return initialized_count;
        }
        ~heap_data_holder()
        {
            for(unsigned i=0;i<initialized_count;++i)
            {
                data[i].~DataType();
            }
            free(data);
        }
        DataType& operator[](unsigned index)
        {
            return data[index];
        }
        
    };

    heap_data_holder data;

    // no copying for now
    dynamic_array& operator=(dynamic_array& other);
    dynamic_array(dynamic_array& other);
public:
    dynamic_array()
    {}
    int add_element(DataType const& new_value)
    {
        heap_data_holder new_data(data.get_count()+1);
        if(new_data.get_error())
            return new_data.get_error();
        for(unsigned i=0;i<data.get_count();++i)
        {
            int const error=new_data.append_copy(data[i]);
            if(error)
                return error;
        }
        int const error=new_data.append_copy(new_value);
        if(error)
            return error;
        new_data.swap(data);
        return 0;
    }
};

It's not too dissimilar, but there's a lot of checks for error codes: add_element has gone from 10 lines to 17, which is almost double, and there's also additional checks in the heap_data_holder class. In my experience, this is typical: if you have to explicitly write error checks at every failure point rather than use exceptions, your code can get quite a lot larger for no gain. Also, the constructor of heap_data_holder can no longer report failure directly: it must store the error code for later retrieval. To my eyes, the exception-based version is a whole lot clearer and more elegant, as well as being shorter: a net gain over the error-code version.

Conclusion

I guess it's a matter of taste, but I find code that uses exceptions is shorter, clearer, and actually has fewer bugs than code that uses error codes. Yes, you have to think about the consequences of an exception, and at which points in the code an exception can be thrown, but you have to do that anyway with error codes, and it's easy to write simple resource management classes to ensure everything is taken care of.

Posted by Anthony Williams
[/ design /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

C, BASIC and Real Programmers

Tuesday, 27 May 2008

There's been a lot of discussion about learning C, and whether or not BASIC provides a good grounding for learning to program, following Joel Spolsky and Jeff Atwood's Stack overflow podcasts.

Having been one of those who grew up with the first batch of home computers in the 1980s, and therefore learnt to program in BASIC on an 8-bit home-computer, I feel ideally qualified to add my tuppence to the discussion.

I think BASIC was a crucial part of my early interactions with computers. When you turned the computer on, it sat there expectantly, with a prompt that said Ready, and a blinking cursor inviting you to type something. The possibilities were endless. Not only that, but you could often view the source code of games, as many of them were written in BASIC. This would allow you to learn from others, and crucially hammered home the idea that you could do this too: they were using BASIC just like you. This is a long way from the experience of today's first-time computer users: the computer starts up, and does all kinds of fancy things from the get-go. You don't type in BASIC commands to make it do things, you click the mouse. Modern computers don't even come with a programming language: you have to install a compiler or interpreter first. I am concerned that the next generation of programmers will be missing out because of this.

BASIC is not enough

However, BASIC is not enough. BASIC teaches you about the general ideas of programming: variables, statements, expressions, etc., but BASIC interpreters rarely featured much in the way of structured programming techniques. Typically, all variables were generally global, and there was often no such thing as a procedure or function call: just about everything was done with GOTO or maybe GOSUB. BASIC learnt in isolation by a lone hobbyist programmer, by cribbing bits from manuals, magazines, and other people's source code, would not engender much in the way of good programming habits. Though it did serve to separate the programming sheep from the non-programming goats, I can see why Dijkstra was so whipping of it. To be a good programmer, BASIC is not enough.

To learn good programming habits and really understand about the machine requires more than BASIC. For many, C is the path to such enlightenment: it provides functions and local variables, so you can learn about structured programming, and it's "close to the machine", so you have to deal with pointers and memory allocation. If you can truly grok programming in C, then it will improve your programming, whatever language you use.

I took another path. Not one that I would necessarily recommend to others, but it certainly worked for me. You see, a home computer came with not just one language but two: BASIC and machine code. As time wore on, the BASIC listing of source code for games would increasingly be a long list of DATA statements with seemingly random sequences of the digits 0-9 and the letters A-F, along with a few lines of BASIC, at least one of which would feature the mysterious POKE command. This is where I learnt about machine code and assembly language: these DATA statements contain the hexadecimal representation of the raw instructions that the computer executes.

Real Programmers do it in hex

Tantalized, I acquired a book on Z80 assembly language, and I was hooked. I would spend hours writing out programs on pieces of paper and converting them into hex codes by looking up the mnemonics in the reference manual. I would calculate jump offsets by counting bytes. Over time I learnt the opcodes for most of the Z80 instruction set. Real Programmers don't need an assembler and certainly not a compiler; Real programmers can do it all by hand!

These days, I use a compiler and assembler like everyone else, but my point still stands, and it is this: by learning assembly language, I had to confront the raw machine at its most basic level. Binary and hexadecimal arithmetic, pointers, subroutines, stacks and registers. Good programming techniques follow naturally: if your loop is too long, the jump instruction at the end won't reach, as there is a limit of 128 bytes on conditional jumps. Duplicate code is not just a problem for maintenance: you have to convert it twice, and it consumes twice as much of your precious address space, so subroutines become an important basic technique. By the time I learnt C, I had already learnt much of the lessons around pointers and memory allocation that you can only get from a low-level language.

It's all in the details

BASIC was an important rite of passage for many of today's programmers: those who learnt programming on their home computer in the 1980s, but it is not enough. High-level programming languages such as C# or Java are a vast improvement on BASIC, but they don't provide programmers with the low-level knowledge that can be gained by really learning C or assembler.

It's the low level details that are important here. If you don't actively program in C, you don't have to learn C per-se, but something equivalently low-level. If you find the idea of writing a whole program in assembler and machine code interesting, go with that: I thoroughly enjoyed it, but it might not be your cup of tea.

C is not enough either

This actually ties in with the whole "learn a new programming language every year" idea: different programming languages bring different ideas and concepts to the mix. I have learnt a lot from looking at how programs are written in Haskell and Lisp, even though I never use them in my work, and I learnt much from Java and C# that I didn't learn from C and assembler. The same applies here: a low level programming language such as C provides a unique perspective that higher-level languages don't provide. Viewing things from this perspective can improve your code whatever language you write in. If you're striving to write elegant software, viewing it from multiple perspectives can only help.

Posted by Anthony Williams
[/ design /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Elegance in Software Part 2

Tuesday, 11 December 2007

In my earlier blog post on Elegance in Software I gave a list of things that I feel contribute to elegant code, and asked for input from my readers. This post is the promised follow-up.

Several respondents mentioned the book Beautiful Code, which is a collection of essays by "leading computer scientists" describing code they feel is beautiful, and why. I've only read an excerpt myself, but it's got good reviews, and what I've read has enticed me to read more. There's also a blog related to the book, which is well worth a read.

Another common theme was the "I know it when I see it" factor. Though I alluded to this in the introduction of my previous post by saying that "elegance is in the eye of the beholder", a lot of people felt this was far more important than any "tick list": there's something special about truly elegant code that transcends the details, just like really good art is more than just a collection of well-executed brush strokes that make up a well-chosen composition. I agree here: elegant code just says "ooh, that's good" when you read it, it has a "Quality without a Name".

Thomas Guest pointed out that appearance plays a part (whilst also discussing the importance of efficiency), and I agree. This ties in with good naming and short functions: if the code is poorly laid out, it's hard to argue that it's elegant. Yes, you can get a "source code beautifier" to physically rearrange the code, but good appearance often goes beyond that: if(some_boolean == true) is just not elegant, no matter how well-spaced it is. This also impacts the language used: it's harder to write "pretty" code in Perl than in Ruby or Scheme.

I particular liked Chris Dollin's characterization: it is obvious what elegant code does when you read it, but it's not necessarily an obvious approach when you haven't seen it before. This ties in with another theme amongst respondents: simplicity. Though I mentioned "minimal code" and "easy to understand" in my original list, "simplicity" goes beyond that, and I think that Chris's obvious solution to a complex problem highlights this. If the code is sufficiently easy to understand that a solution to a complex problem appears obvious, then it's probably a good demonstration of simplicity. Such code is "clever with a purpose" (as Pat Maddox described it).

Jim Shore has an interesting article on good design, in which he argues that the eye-of-the-beholder-ness of "elegant" is too vague for his liking, and instead tries to argue for "Quality with a Name". He says:

"A good software design minimizes the time required to create, modify, and maintain the software while achieving acceptable run-time performance."

Whilst this is definitely true, this ties in with the "tick list" from my previous posting. Elegant code is more than that, and I think this is important: software development is a craft, and developers are craftsmen. By taking pride in our work, by striving to write code that is not just good, but elegant, we are improving the state of our craft. Just as mathematicians strive for beautiful or elegant proofs, and are not satisfied with a proof by exhaustion, we should not be satisfied with code that is merely good, but strive for code that is elegant.

It is true that what I find to be elegant may be different from what you find to be elegant, but I hope believe that good programmers would agree that two pieces of code were "good code" even if they differ in their opinion of which is more elegant, much as art critics would agree that both a painting by Monet and one by Van Gogh were both good paintings, whilst differing in their opinion of which is better.

Posted by Anthony Williams
[/ design /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Elegance in Software

Monday, 12 November 2007

What does it mean for software to be elegant? When I write code, elegance is something I aspire to, and in some senses goes hand-in-hand with beautiful code, but that doesn't really make it any clearer. Certainly, I think there is a strong element of "elegance is in the eye of the beholder", but I think there are also some characteristics of the software that are contributory factors — how a particular person may rate the code on any aspect may vary, as may the importance they place on any given aspect, but these aspects will almost certainly impact how elegant the code appears.

Factors affecting the elegance of software

Here's a short list of some of the factors that I think are important. Obviously, this is not an exhaustive list, and all comments are my opinion, and not definitive.

Does it work?
I'd be hard-pushed to call software "elegant" if it didn't work
Is it easy to understand?
Lots of the following factors can really be summarised by this one: if I can't understand the code, it's not elegant.
Is it efficient?
A bubble sort is just not elegant, because there's lots of much more efficient algorithms. If a cunning algorithmic trick can drastically reduce the runtime, using that trick contributes to making the code elegant, especially if it is still easy to understand.
Short functions
Long functions make the code hard to follow. If I can't see the whole function on one screen in my editor, it's too long. Ideally, a function should be really short, less than 5 lines.
Good naming
Short functions are all very well, but if functions are called foo, abc, or wrt_lng_dt, it can still be hard to understand the code. Of course, this applies to classes just as much as functions.
Clear division of responsibility
It is important that it is clear which function or class is responsible for any given aspect of the design. Not only that, but a class or function should not have too many responsibilities — by the Single Responsibility Principle a class or function should have just one responsibility.
High cohesion
Cohesion is a measure of how closely related the data items and functions in a class or module are to each other. This is tightly tied in to division of responsibility — if a function is responsible for calculating primes and managing network connections, then it has low cohesion, and a poor division of responsibility.
Low coupling
Classes and modules should not have have unnecessary dependencies between them. If a change to the internals of one class or function requires a change to apparently unrelated code elsewhere, there is too much coupling. This is also related to the division of responsibility, and excessive coupling can be a sign that too many classes, modules or functions share a single responsibility.
Appropriate use of OO and other techniques
It is not always appropriate to encapsulate something in a class — sometimes a simple function will suffice, and sometimes other techniques are more appropriate. This is also related to the division of responsibilities, but it goes beyond that — is this code structure the most appropriate for handling this particular responsibility? Language idioms come into play here: is it more appropriate to use STL-style std::sort on an iterator interface, or does it make more sense to provide a sort member function? Can the algorithm be expressed in a functional way, or is an imperative style more appropriate?
Minimal code
Code should be short and to-the-point. Overly-long code can be the consequence of doing things at too low a level, and doing byte-shuffling rather than using a high-level sort algorithm. It can also be the consequence of too many levels of indirection — if a function does nothing except call one other function, it's getting in the way. Sometimes this can be at odds with good naming — a well-named function with a clear responsibility just happens to be able to delegate to a generic function, for example — but there's obviously a trade-off. Minimal code is also related to duplication — if two blocks of code do the same thing, one of them should be eliminated.

One thing that is not present in the above list is comments in the code. In my view, the presence of comments in the code implies that the code is not sufficiently clear. Yes, well-written comments can make it easier to understand a given block of code, but they should in general be unnecessary: truly elegant code can be understood without comments. Of course, you might need to understand what it is that the code is trying to accomplish before it makes complete sense, particularly if the code is using advanced algorithms, and comments can help with that (e.g. by providing a reference to the algorithm), but my general view is that comments are a sign of less-than-perfect code.

Let me know what you think constitutes elegant code.

Posted by Anthony Williams
[/ design /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.