Just Software Solutions

Duplication in Software

Tuesday, 26 March 2013

Much has been said about the importance of reducing duplication in software. For example, J. B. Rainsberger has "minimizes duplication" as the second of his four "Elements of Simple Design", and lots of the teachings of the Agile community stress the importance of reducing duplication when refactoring code.

Inspired by Kevlin Henney's tweet last week, where he laments that programmers trying to remove duplication often take it literally, I wanted to talk about the different kinds of duplication in software. I've just mentioned "literal" duplication, so let's start with that.

Basic Literal Duplication

This is the most obvious form of duplication: sections of code which are completely identical. This most often arises due to copy-and-paste programming, but can often arise in the form of repetitive patterns — a simple for loop that is repeated multiple places with the same body, for example.

Removing Literal Duplication

The easiest to create, literal duplication is also the easiest to remove: just extract a function that does the necessary operation.

Sometimes, though the code is identical, the types involved are different. You cannot address this with extracting a simple function, so we have a new class of duplication.

Parametric Literal Duplication

Parametric literal duplication can also arise from copy-and-paste programming. The key feature is that the types of the variables are different so you cannot just reuse the code from one place in another, even if it was a nicely self-contained function. If you eliminate all the basic literal duplication, parametric literal duplication will give you sets of functions with identical structure but different types.

With the lack of a portable is_ready() function for std::future, it is common to test whether a future f is ready by writing f.wait_for(std::chrono::seconds(0))==std::future_status::ready. Since std::future is a class template, the types of the various futures that you may wish to check for readiness may vary, so you cannot extract a simple function. If you write this in multiple places you therefore have parametric literal duplication.

Removing Parametric Literal Duplication

There are various ways to remove parametric literal duplication. In C++ the most straightforward is probably to use a template. e.g.

template<typename T>
inline bool is_ready(std::future<T> f){
    return f.wait_for(std::chrono::seconds(0))==std::future_status::ready;
}

In other languages you might choose to use generics, or rely on duck-typing. You might also do it by extracting an interface and using virtual function calls, but that requires that you can modify the types of the objects, or are willing to write a facade.

Parametric literal duplication is closely related to what I call Structural Duplication.

Structural Duplication

This is where the overall pattern of some code is the same, but the details differ. For example, a for loop that iterates over a container is a common structure, but the loop body varies from loop to loop.e.g

std::vector<int> v;

int sum=0;
for(std::vector<int>::iterator it=v.begin();it!=v.end();++it){
    sum+=*it;
}
for(std::vector<int>::iterator it=v.begin();it!=v.end();++it){
    std::cout<<*it<<std::endl;
}

You can't just extract the whole loop into a separate function because the loop body is different, but that doesn't mean you can't do anything about it.

Removing Structural Duplication

One common way to remove such duplication is to extract the commonality with the template method pattern, or create a parameterized function where the details are passed in as a function to call.

For simple loops like the ones above, we have std::for_each, and the new-style C++11 for loops:

std::for_each(v.begin(),v.end(),[&](int x){sum+=x;});
std::for_each(v.begin(),v.end(),[](int x){std::cout<<x<<std::endl;});

for(int x:v){
    sum+=x;
}
for(int x:v){
    std::cout<<x<<std::endl;
}

Obviously, if your repeated structure doesn't match the standard library algorithms then you must write your own, but the idea is the same: take a function parameter which is a callable object and which captures the variable part of the structure. For a loop, this is the loop body. For a sort algorithm it is the comparison, and so forth.

Temporal Duplication

This is where some code only appears once in the source code, but is executed repeatedly, and the only desired outcome is the computed result, which is the same for each invocation. For example, the call to v.size() or v.end() to find the upper bound of an iteration through a container.

std::vector<int> v;
for(unsigned i=0;i<v.size();++i)
{
    do_stuff(v[i]);
}

It doesn't just happen in loops, though. For example, in a function that inserts data into a database table you might build a query object, run it to insert the data, and then destroy it. If this function is called repeatedly then you are repeatedly building the query object and destroying it. If your database library supports parameterization then you may well be able to avoid this duplication.

Removing Temoral Duplication

The general process for removing temporal duplication is to use some form of caching or memoization — the value is computed once and then stored, and this stored value is used in place of the computation for each subsequent use. For loops, this can be as simple as extracting a variable to hold the value:

for(unsigned i=0,end=v.size();i!=end;++i){
    do_stuff(v[i]);
}

For other things it can be more complex. For example, with the database query example above, you may need to switch to using a parameterized query so that on each invocation you can bind the new values to the query parameters, rather than building the query around the specific parameters to insert.

Duplication of Intent

Sometimes the duplication does not appear in the actual code, but in what the code is trying to achieve. This often occurs in large projects where multiple people have worked on the code base. One person writes some code to do something in one source file, and another writes some code to do the same thing in another source file, but different styles mean that the code is different even though the result is the same. This can also happen with a single developer if the different bits are written with a large enough gap, such that you cannot remember what you did before and your style has changes slightly. To beat the loop iteration example to death, you might have some code that loops through a container by index, and other code that loops through the same container using iterators. The structure is different, but the intent is the same.

Removing Duplication of Intent

This is one of the hardest types of duplication to spot and remove. The way to remove it is to refactor one or both of the pieces of code until they have the same structure, and are thus more obviously duplicates of one-another. You can then treat them either as literal duplication, parametric literal duplication or structural duplication as appropriate.

Incidental Duplication

This is where there is code that looks identical but has completely a different meaning in each place. The most obvious form of this is with "magic numbers" — the constant "3" in one place typically has a completely different meaning to the constant "3" somewhere else.

Removing Incidental Duplication

You can't necessarily entirely eliminate incidental duplication, but you can minimize it by good naming. By using symbolic constants instead of literals then it is clear that different uses are distinct because the name of the constant is distinct. There will be still be duplication of the literal in the definition of the constants, but this is now less problematic.

In the case that this incidental duplication is not just a constant then you can extract separate named functions that encapsulate this duplicate code, and express the intent in each case. The duplication is now just between these function bodies than between the uses, and the naming of the functions makes it clear that this is just incidental duplication.

Conclusion

There are quite a few types of duplication that you may get in your code. By eliminating them you will tend to make your code shorter, clearer, and easier to maintain.

If you can think of any types of duplication I've missed, please add a comment.

Posted by Anthony Williams
[/ design /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

11 Comments

Great article Anthony! Yet another lesson to learn to make the world of code a better place :)

The case of incidental duplication is an imporant one as it touches less obvious motivations. In my opinion, it should the nail to the coffin of favouring explicit literals instead of symbolic constants.

by Mateusz Loskot at 13:12:15 on Wednesday, 27 March 2013
this is great information to me
by alok kumrawat at 09:38:37 on Monday, 01 April 2013

Techniquely you are totally right because duplication is main component of software development.and lots of people are aware about this type of things.

by Chase Sweeney at 06:19:57 on Monday, 01 July 2013

nice blo,good attepmt<a href=" http://webrocks.in/newsevents.htm"> Web Development </a>

by propidadrakesh at 11:59:35 on Thursday, 02 January 2014

There are many types of duplication that may get in code. But, it is important to remove duplication from code to make software development more better. There are many methods to remove duplication in software development.

by Hines Darrel at 13:30:58 on Wednesday, 26 March 2014

Hello.This article was extremely remarkable, particularly since I was searching for thoughts on this topic. In my opinion there is no value of duplication of code.

by Aysha Akhter at 13:01:34 on Wednesday, 13 August 2014

This post has really impressive information about duplication of software. Thanks for sharing such a nice post.

by mariaandreson at 04:39:41 on Thursday, 06 November 2014

This post is really very impressive and has lot of information about software duplication. Software duplication can cause of business loss. So we need to ensure about the custom software development company background and their past work history. Thanks for sharing this post.

by Vertex Computer Systems, Inc. at 09:23:36 on Thursday, 25 June 2015

Thanks for sharing this article. It is very useful and informative post that can be able to describe about the duplication in software development. Such a very remarkable post! Well Done!

by Vertex Computer Systems, Inc. at 11:45:53 on Friday, 17 July 2015

thanks for the intellectual words the article is awesome

by lucky patcher new download at 06:26:55 on Friday, 27 January 2017

wonderful site alot of great information here

http://praxent.com

by praxent at 18:02:40 on Monday, 22 January 2018

Add your comment

Your name:

Email address:

Person or spambot?

Your comment:

Design and Content Copyright © 2005-2018 Just Software Solutions Ltd. All rights reserved. | Privacy Policy