Blog Archive for / 2008 /

New C++ Working Draft and Concurrency Papers Now Available

Wednesday, 02 July 2008

The post-meeting mailing following June's C++ Standards committee meeting in France is now available. This includes a new Working Draft for the C++0x standard, and a few concurrency-related papers.

From a concurrency point of view, there are several papers of interest. Firstly, a few have been accepted into the working draft, notably:

N2661: A Foundation to Sleep On
This paper provides a generalised time point and duration library, which is used by the thread functions that take times or durations. These have been updated to use these new types and renamed to make their purpose clearer: functions that wait for a duration are now called xxx_for, and take a value of type std::chrono::duration<Rep,Period>, whereas those that take absolute time points are now called xxx_until and take a value of type std::chrono::time_point<Clock,Duration>.
N2668: Concurrency Modifications to Basic String
The changes in this paper ensure that it is safe for two threads to access the same std::string object at the same time, provided they both perform only read operations. They also ensure that copying a string object and then modifying that copy is safe, even if another thread is accessing the original. This essentially disallows copy-on-write implementations since the benefits are now severely limited.
N2660: Dynamic Initialization and Destruction with Concurrency
With the changes from this paper, if an application uses multiple threads then the initialization and destruction of objects with static storage duration (such as global variables) may run concurrently on separate threads. This can provide faster start-up and shut-down times for an application, but it can also introduce the possibility of race conditions where none existed previously. If you use threads in your application, it is now even more important to check the initialization order of objects with static storage duration.
N2514: Implicit Conversion Operators for Atomics
With this change, the atomic types such as std::atomic_int are implicitly convertible to their corresponding fundamental types. This means, for example, that:
std::atomic_int x;
int y=x;
is well-formed where it wasn't previously. The implicit conversions are equivalent to calling the load() member function, and have memory_order_seq_cst ordering semantics.
N2674: Shared_ptr atomic access, revision 1
This paper introduces a new set of overloads of the free functions for atomic operations (such as atomic_load and atomic_store), which operate on instances of std::shared_ptr<>. This allows one thread to read an instance of std::shared_ptr whilst another thread is modifying that same instance if they both use the new atomic functions.
This paper also renames atomic_swap operations to atomic_exchange (and likewise for atomic_compare_swap and the corresponding member functions) for all atomic types, in order to avoid confusion with other types that provide swap functions. The atomic exchange operations only alter the value of a single object, replacing the old value with a new one, they do not exchange the values of two objects in the way that std::swap does.
N2664: C++ Data-Dependency Ordering: Atomics and Memory Model
With the adoption of this paper the memory model gets a new ordering option: memory_order_consume. This is a limited form of memory_order_acquire which allows for data-dependent ordering. If a thread uses memory_order_consume, then it is not guaranteed to see modifications to other variables made by the thread that performed the releasing operation unless those variables are accessed in conjunction with the consumed variable. This means, for example, that member variables of an object are visible if the consumed value is a pointer to that object, but that values of independent objects are not necessarily visible. This allows the compiler to perform some optimizations that are forbidden by memory_order_acquire, and reduces the synchronization overhead on some hardware architectures.
N2678: Error Handling Specification for Chapter 30 (Threads)
This paper brings the exceptions thrown by the thread under the new system_error umbrella, with corresponding error codes and error categories.
N2669: Thread-Safety in the Standard Library (Rev 2)
Now the standard supports threads, we need to say which standard library operations are thread-safe, and which are not. This paper basically says that non-modifying operations on the same object are safe, and any operations on separate objects are also safe. Also, separate threads may call the same library functions on separate objects without problems. As you might expect, concurrent modifications to the same object are data races and undefined behaviour.

The committee also voted to include N2659: Thread-Local Storage in C++0x, but it doesn't appear to be in the current draft. This paper introduces the thread_local keyword to indicate that each thread should have its own copy of a given object.

Finally, N2657: Local and Unnamed Types as Template Arguments has been incorporated in the working paper. Though this isn't directly concurrency related, it is something I've been campaigning for since N1427 back in 2003.

Apart from N2657, I've only listed the concurrency changes: check out the Working Draft for the C++0x standard, and the State of C++ Evolution for more details on the changes.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Condition Variable Spurious Wakes

Friday, 27 June 2008

Condition variables are a useful mechanism for waiting until an event occurs or some "condition" is satisfied. For example, in my implementation of a thread-safe queue I use a condition variable to avoid busy-waiting in wait_and_pop() when the queue is empty. However, condition variables have one "feature" which is a common source of bugs: a wait on a condition variable may return even if the condition variable has not been notified. This is called a spurious wake.

Spurious wakes cannot be predicted: they are essentially random from the user's point of view. However, they commonly occur when the thread library cannot reliably ensure that a waiting thread will not miss a notification. Since a missed notification would render the condition variable useless, the thread library wakes the thread from its wait rather than take the risk.

Bugs due to spurious wakes

Consider the code for wait_and_pop from my thread-safe queue:

    void wait_and_pop(Data& popped_value)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        while(the_queue.empty())
        {
            the_condition_variable.wait(lock);
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
    }

If we know that there's only one consumer thread, it would be tempting to write this with an if instead of a while, on the assumption that there's only one thread waiting, so if it's been notified, the queue must not be empty:

    if(the_queue.empty()) // Danger, Will Robinson
    {
        the_condition_variable.wait(lock);
    }

With the potential of spurious wakes this is not safe: the wait might finish even if the condition variable was not notified. We therefore need the while, which has the added benefit of allowing multiple consumer threads: we don't need to worry that another thread might remove the last item from the queue, since we're checking to see if the queue is empty before proceeding.

That's the beginner's bug, and one that's easily overcome with a simple rule: always check your predicate in a loop when waiting with a condition variable. The more insidious bug comes from timed_wait().

Timing is everything

condition_variable::wait() has a companion function that allows the user to specify a time limit on how long they're willing to wait: condition_variable::timed_wait(). This function comes as a pair of overloads: one that takes an absolute time, and one that takes a duration. The absolute time overload will return once the clock reaches the specified time, whether or not it was notified. The duration overload will return once the specified duration has elapsed: if you say to wait for 3 seconds, it will stop waiting after 3 seconds. The insidious bug comes from the overload that takes a duration.

Suppose we wanted to add a timed_wait_and_pop() function to our queue, that allowed the user to specify a duration to wait. We might be tempted to write it as:

    template<typename Duration>
    bool timed_wait_and_pop(Data& popped_value,
                            Duration const& timeout)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        while(the_queue.empty())
        {
            if(!the_condition_variable.timed_wait(lock,timeout))
                return false;
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
        return true;
    }

At first glance this looks fine: we're handling spurious wakes by looping on the timed_wait() call, and we're passing the timeout in to that call. Unfortunately, the timeout is a duration, so every call to timed_wait() will wait up to the specified amount of time. If the timeout was 1 second, and the timed_wait() call woke due to a spurious wake after 0.9 seconds, the next time round the loop would wait for a further 1 second. In theory this could continue ad infinitum, completely defeating the purpose of using timed_wait() in the first place.

The solution is simple: use the absolute time overload instead. By specifying a particular clock time as the timeout, the remaining wait time decreases with each call. This requires that we determine the final timeout prior to the loop:

    template<typename Duration>
    bool timed_wait_and_pop(Data& popped_value,
                            Duration const& wait_duration)
    {
        boost::system_time const timeout=boost::get_system_time()+wait_duration;

        boost::mutex::scoped_lock lock(the_mutex);
        while(the_queue.empty())
        {
            if(!the_condition_variable.timed_wait(lock,timeout))
                return false;
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
        return true;
    }

Though this solves the problem, it's easy to make the mistake. Thankfully, there is a better way to wait that doesn't suffer from this problem: pass the predicate to the condition variable.

Passing the predicate to the condition variable

Both wait() and timed_wait() come with additional overloads that allow the user to specify the condition being waited for as a predicate. These overloads encapsulate the while loops from the examples above, and ensure that spurious wakes are correctly handled. All that is required is that the condition being waited for can be checked by means of a simple function call or a function object which is passed as an additional parameter to the wait() or timed_wait() call.

wait_and_pop() can therefore be written like this:


    struct queue_not_empty
    {
        std::queue<Data>& queue;

        queue_not_empty(std::queue<Data>& queue_):
            queue(queue_)
        {}
        bool operator()() const
        {
            return !queue.empty();
        }
    };

    void wait_and_pop(Data& popped_value)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        the_condition_variable.wait(lock,queue_not_empty(the_queue));
        popped_value=the_queue.front();
        the_queue.pop();
    }

and timed_wait_and_pop() can be written like this:

    template<typename Duration>
    bool timed_wait_and_pop(Data& popped_value,
                            Duration const& wait_duration)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        if(!the_condition_variable.timed_wait(lock,wait_duration,
            queue_not_empty(the_queue)))
            return false;
        popped_value=the_queue.front();
        the_queue.pop();
        return true;
    }

Note that what we're waiting for is the queue not to be empty — the predicate is the reverse of the condition we would put in the while loop. This will be much easier to specify when compilers implement the C++0x lambda facilities.

Conclusion

Spurious wakes can cause some unfortunate bugs, which are hard to track down due to the unpredictability of spurious wakes. These problems can be avoided by ensuring that plain wait() calls are made in a loop, and the timeout is correctly calculated for timed_wait() calls. If the predicate can be packaged as a function or function object, using the predicated overloads of wait() and timed_wait() avoids all the problems.

Posted by Anthony Williams
[/ threading /] permanent link
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Comments Now Enabled - what would you like to see?

Thursday, 26 June 2008

I have now updated my blog engine to allow comments on my blog posts, so please give it a whirl.

To kick things off, please add a comment on this entry if there's something you'd like me to cover on my blog, and I'll pick the ones I feel able to write about as topics for future posts.

If you're viewing this post in an RSS reader, you'll have to actually go to the website to comment. If you're viewing this post on one of the blog directory pages, click on the title or follow the "Permanent Link" to get to the entry page.

Any comments I feel are inappropriate or spam will be deleted.

Posted by Anthony Williams
[/ news /] permanent link
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Exceptions make for Elegant Code

Friday, 06 June 2008

On this week's Stack Overflow podcast, Joel comes out quite strongly against exceptions, on the basis that they are hidden flow paths. Whilst I can sympathise with the idea of making every possible control path in a routine explicitly visible, having just had to write some C code for a recent project I would really like to say that this actually makes the code a lot harder to follow, as the actual code for what it's really doing is hidden amongst a load of error checking.

Whether or not you use exceptions, you have the same number of possible flow paths. With exceptions, the code can be a lot cleaner than with exceptions, as you don't have to write a check after every function call to verify that it did indeed succeed, and you can now proceed with the rest of the function. Instead, the code tells you when it's gone wrong by throwing an exception.

Exceptions also simplify the function signature: rather than having to add an additional parameter to hold the potential error code, or to hold the function result (because the return value is used for the error code), exceptions allow the function signature to specify exactly what is appropriate for the task at hand, with errors being reported "out-of-band". Yes, some functions use errno, which helps by providing a similar out-of-band error channel, but it's not a panacea: you have to check and clear it between every call, otherwise you might be passing invalid data into subsequent functions. Also, it requires that you have a value you can use for the return type in the case that an error occurs. With exceptions you don't have to worry about either of these, as they interrupt the code at the point of the error, and you don't have to supply a return value.

Here's three implementations of the same function using error code returns, errno and exceptions:

    int foo_with_error_codes(some_type param1,other_type param2,result_type* result)
    {
        int error=0;
        intermediate_type temp;

        if((error=do_blah(param1,23,&temp)) ||
           (error=do_flibble(param2,temp,result))
        {
            return error;
        }
        return 0;
    }

    result_type foo_with_errno(some_type param1,other_type param2)
    {
        errno=0;
        intermediate_type temp=do_blah(param1,23);
        if(errno)
        {
            return dummy_result_type_value;
        }

        return do_flibble(param2,temp);
    }

    result_type foo_with_exceptions(some_type param1,other_type param2)
    {
        return do_flibble(param2,do_blah(param1,23));
    }

Error Recovery

In all three cases, I've assumed that there's no recovery required if do_blah succeeds but do_flibble fails. If recovery was required, additional code would be required. It could be argued that this is where the problems with exceptions begin, as the code paths for exceptions are hidden, and it is therefore unclear where the cleanup must be done. However, if you design your code with exceptions in mind I find you still get elegant code. try/catch blocks are ugly: this is where deterministic destruction comes into its own. By encapsulating resources, and performing changes in an exception-safe manner, you end up with elegant code that behaves gracefully in the face of exceptions, without cluttering the "happy path". Here's some code:

    int foo_with_error_codes(some_type param1,other_type param2,result_type* result)
    {
        int error=0;
        intermediate_type temp;

        if(error=do_blah(param1,23,&temp))
        {
            return error;
        }

        if(error=do_flibble(param2,temp,result))
        {
            cleanup_blah(temp);
            return error;
        }
        return 0;
    }

    result_type foo_with_errno(some_type param1,other_type param2)
    {
        errno=0;
        intermediate_type temp=do_blah(param1,23);
        if(errno)
        {
            return dummy_result_type_value;
        }

        result_type res=do_flibble(param2,temp);
        if(errno)
        {
            cleanup_blah(temp);
            return dummy_result_type_value;
        }
        return res;
    }

    result_type foo_with_exceptions(some_type param1,other_type param2)
    {
        return do_flibble(param2,do_blah(param1,23));
    }

    result_type foo_with_exceptions2(some_type param1,other_type param2)
    {
        blah_cleanup_guard temp(do_blah(param1,23));
        result_type res=do_flibble(param2,temp);
        temp.dismiss();
        return res;
    }

In the error code cases, we need to explicitly cleanup on error, by calling cleanup_blah. In the exception case we've got two possibilities, depending on how your code is structured. In foo_with_exceptions, everything is just handled directly: if do_flibble doesn't take ownership of the intermediate data, it cleans itself up. This might well be the case if do_blah returns a type that handles its own resources, such as std::string or boost::shared_ptr. If explicit cleanup might be required, we can write a resource management class such as blah_cleanup_guard used by foo_with_exceptions2, which takes ownership of the effects of do_blah, and calls cleanup_blah in the destructor unless we call dismiss to indicate that everything is going OK.

Real Examples

That's enough waffling about made up examples, let's look at some real code. Here's something simple: adding a new value to a dynamic array of DataType objects held in a simple dynamic_array class. Let's assume that objects of DataType can somehow fail to be copied: maybe they allocate memory internally, which may therefore fail. We'll also use a really dumb algorithm that reallocates every time a new element is added. This is not for any reason other than it simplifies the code: we don't need to check whether or not reallocation is needed.

If we're using exceptions, that failure will manifest as an exception, and our code looks like this:

class DataType
{
public:
    DataType(const DataType& other);
};

class dynamic_array
{
private:
    class heap_data_holder
    {
        DataType* data;
        unsigned initialized_count;

    public:
        heap_data_holder():
            data(0),initialized_count(0)
        {}
        explicit heap_data_holder(unsigned max_count):
            data((DataType*)malloc(max_count*sizeof(DataType))),
            initialized_count(0)
        {
            if(!data)
            {
                throw std::bad_alloc();
            }
        }
        void append_copy(DataType const& value)
        {
            new (data+initialized_count) DataType(value);
            ++initialized_count; 
        }
        void swap(heap_data_holder& other)
        {
            std::swap(data,other.data);
            std::swap(initialized_count,other.initialized_count);
        }
        unsigned get_count() const
        {
            return initialized_count;
        }
        ~heap_data_holder()
        {
            for(unsigned i=0;i<initialized_count;++i)
            {
                data[i].~DataType();
            }
            free(data);
        }
        DataType& operator[](unsigned index)
        {
            return data[index];
        }
        
    };

    heap_data_holder data;

    // no copying for now
    dynamic_array& operator=(dynamic_array& other);
    dynamic_array(dynamic_array& other);
public:
    dynamic_array()
    {}
    void add_element(DataType const& new_value)
    {
        heap_data_holder new_data(data.get_count()+1);
        for(unsigned i=0;i<data.get_count();++i)
        {
            new_data.append_copy(data[i]);
        }
        new_data.append_copy(new_value);
        new_data.swap(data);
    }
};

On the other, if we can't use exceptions, the code looks like this:

class DataType
{
public:
    DataType(const DataType& other);
    int get_error();
};

class dynamic_array
{
private:
    class heap_data_holder
    {
        DataType* data;
        unsigned initialized_count;
        int error_code;

    public:
        heap_data_holder():
            data(0),initialized_count(0),error_code(0)
        {}
        explicit heap_data_holder(unsigned max_count):
            data((DataType*)malloc(max_count*sizeof(DataType))),
            initialized_count(0),
            error_code(0)
        {
            if(!data)
            {
                error_code=out_of_memory;
            }
        }
        int get_error() const
        {
            return error_code;
        }
        int append_copy(DataType const& value)
        {
            new (data+initialized_count) DataType(value);
            if(data[initialized_count].get_error())
            {
                int const error=data[initialized_count].get_error();
                data[initialized_count].~DataType();
                return error;
            }
            ++initialized_count;
            return 0;
        }
        void swap(heap_data_holder& other)
        {
            std::swap(data,other.data);
            std::swap(initialized_count,other.initialized_count);
        }
        unsigned get_count() const
        {
            return initialized_count;
        }
        ~heap_data_holder()
        {
            for(unsigned i=0;i<initialized_count;++i)
            {
                data[i].~DataType();
            }
            free(data);
        }
        DataType& operator[](unsigned index)
        {
            return data[index];
        }
        
    };

    heap_data_holder data;

    // no copying for now
    dynamic_array& operator=(dynamic_array& other);
    dynamic_array(dynamic_array& other);
public:
    dynamic_array()
    {}
    int add_element(DataType const& new_value)
    {
        heap_data_holder new_data(data.get_count()+1);
        if(new_data.get_error())
            return new_data.get_error();
        for(unsigned i=0;i<data.get_count();++i)
        {
            int const error=new_data.append_copy(data[i]);
            if(error)
                return error;
        }
        int const error=new_data.append_copy(new_value);
        if(error)
            return error;
        new_data.swap(data);
        return 0;
    }
};

It's not too dissimilar, but there's a lot of checks for error codes: add_element has gone from 10 lines to 17, which is almost double, and there's also additional checks in the heap_data_holder class. In my experience, this is typical: if you have to explicitly write error checks at every failure point rather than use exceptions, your code can get quite a lot larger for no gain. Also, the constructor of heap_data_holder can no longer report failure directly: it must store the error code for later retrieval. To my eyes, the exception-based version is a whole lot clearer and more elegant, as well as being shorter: a net gain over the error-code version.

Conclusion

I guess it's a matter of taste, but I find code that uses exceptions is shorter, clearer, and actually has fewer bugs than code that uses error codes. Yes, you have to think about the consequences of an exception, and at which points in the code an exception can be thrown, but you have to do that anyway with error codes, and it's easy to write simple resource management classes to ensure everything is taken care of.

Posted by Anthony Williams
[/ design /] permanent link
Tags: , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Updated (yet again) Implementation of Futures for C++

Friday, 30 May 2008

I have updated my prototype futures library implementation yet again. This version adds wait_for_any() and wait_for_all() functions, which can be used either to wait for up to five futures known at compile time, or a dynamic collection using an iterator range.

    jss::unique_future<int> futures[count];
    // populate futures
    jss::unique_future<int>* const future=
        jss::wait_for_any(futures,futures+count);

    std::vector<jss::shared_future<int> > vec;
    // populate vec
    std::vector<jss::shared_future<int> >::iterator const f=
        jss::wait_for_any(vec.begin(),vec.end());

The new version is available for download, again under the Boost Software License. It still needs to be compiled against the Boost Subversion Trunk, as it uses the Boost Exception library and some new features of the Boost.Thread library, which are not available in an official boost release.

Sample usage can be seen in the test harness. The support for alternative allocators is still missing. The documentation for the futures library is available online, but is also included in the zip file.

Please download this prototype, put it through its paces, and let me know what you think.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

C, BASIC and Real Programmers

Tuesday, 27 May 2008

There's been a lot of discussion about learning C, and whether or not BASIC provides a good grounding for learning to program, following Joel Spolsky and Jeff Atwood's Stack overflow podcasts.

Having been one of those who grew up with the first batch of home computers in the 1980s, and therefore learnt to program in BASIC on an 8-bit home-computer, I feel ideally qualified to add my tuppence to the discussion.

I think BASIC was a crucial part of my early interactions with computers. When you turned the computer on, it sat there expectantly, with a prompt that said Ready, and a blinking cursor inviting you to type something. The possibilities were endless. Not only that, but you could often view the source code of games, as many of them were written in BASIC. This would allow you to learn from others, and crucially hammered home the idea that you could do this too: they were using BASIC just like you. This is a long way from the experience of today's first-time computer users: the computer starts up, and does all kinds of fancy things from the get-go. You don't type in BASIC commands to make it do things, you click the mouse. Modern computers don't even come with a programming language: you have to install a compiler or interpreter first. I am concerned that the next generation of programmers will be missing out because of this.

BASIC is not enough

However, BASIC is not enough. BASIC teaches you about the general ideas of programming: variables, statements, expressions, etc., but BASIC interpreters rarely featured much in the way of structured programming techniques. Typically, all variables were generally global, and there was often no such thing as a procedure or function call: just about everything was done with GOTO or maybe GOSUB. BASIC learnt in isolation by a lone hobbyist programmer, by cribbing bits from manuals, magazines, and other people's source code, would not engender much in the way of good programming habits. Though it did serve to separate the programming sheep from the non-programming goats, I can see why Dijkstra was so whipping of it. To be a good programmer, BASIC is not enough.

To learn good programming habits and really understand about the machine requires more than BASIC. For many, C is the path to such enlightenment: it provides functions and local variables, so you can learn about structured programming, and it's "close to the machine", so you have to deal with pointers and memory allocation. If you can truly grok programming in C, then it will improve your programming, whatever language you use.

I took another path. Not one that I would necessarily recommend to others, but it certainly worked for me. You see, a home computer came with not just one language but two: BASIC and machine code. As time wore on, the BASIC listing of source code for games would increasingly be a long list of DATA statements with seemingly random sequences of the digits 0-9 and the letters A-F, along with a few lines of BASIC, at least one of which would feature the mysterious POKE command. This is where I learnt about machine code and assembly language: these DATA statements contain the hexadecimal representation of the raw instructions that the computer executes.

Real Programmers do it in hex

Tantalized, I acquired a book on Z80 assembly language, and I was hooked. I would spend hours writing out programs on pieces of paper and converting them into hex codes by looking up the mnemonics in the reference manual. I would calculate jump offsets by counting bytes. Over time I learnt the opcodes for most of the Z80 instruction set. Real Programmers don't need an assembler and certainly not a compiler; Real programmers can do it all by hand!

These days, I use a compiler and assembler like everyone else, but my point still stands, and it is this: by learning assembly language, I had to confront the raw machine at its most basic level. Binary and hexadecimal arithmetic, pointers, subroutines, stacks and registers. Good programming techniques follow naturally: if your loop is too long, the jump instruction at the end won't reach, as there is a limit of 128 bytes on conditional jumps. Duplicate code is not just a problem for maintenance: you have to convert it twice, and it consumes twice as much of your precious address space, so subroutines become an important basic technique. By the time I learnt C, I had already learnt much of the lessons around pointers and memory allocation that you can only get from a low-level language.

It's all in the details

BASIC was an important rite of passage for many of today's programmers: those who learnt programming on their home computer in the 1980s, but it is not enough. High-level programming languages such as C# or Java are a vast improvement on BASIC, but they don't provide programmers with the low-level knowledge that can be gained by really learning C or assembler.

It's the low level details that are important here. If you don't actively program in C, you don't have to learn C per-se, but something equivalently low-level. If you find the idea of writing a whole program in assembler and machine code interesting, go with that: I thoroughly enjoyed it, but it might not be your cup of tea.

C is not enough either

This actually ties in with the whole "learn a new programming language every year" idea: different programming languages bring different ideas and concepts to the mix. I have learnt a lot from looking at how programs are written in Haskell and Lisp, even though I never use them in my work, and I learnt much from Java and C# that I didn't learn from C and assembler. The same applies here: a low level programming language such as C provides a unique perspective that higher-level languages don't provide. Viewing things from this perspective can improve your code whatever language you write in. If you're striving to write elegant software, viewing it from multiple perspectives can only help.

Posted by Anthony Williams
[/ design /] permanent link
Tags: , , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

The A-Z of Cool Computer Games

Tuesday, 27 May 2008

My wife picked up this book last week, and it's an absolutely fabulous book. It's a jolly, nostalgic trip down memory lane for those of us (like myself and my wife) who grew up with the first batch of home computers in the 1980s. If you can look back fondly on the touch sssssssssseeeeeeenstiiive keyboard of the ZX81, the nine (count them!) colours of the Dragon-32, the 64K (wow!) and hardware sprites of the Commodore 64, and the delights of games like Manic Miner, Frogger and Hungry Horace, then this book is for you.

This book covers more than just the games, though: there are sections on the home computers themselves, the social environment surrounding home computer usage, and the various paraphernalia and random bits of gadgetry people used to have. Over time, the nature of computer games has changed quite considerably: no longer can you look at the source code for a game just by pressing Escape or Break and typing LIST at the ensuing BASIC prompt; no longer do we have to fiddle with the volume and tone controls on our tape decks in order to get the latest game to load; and no longer are we limited to 16 colours (or less).

If you've got a bit of time to spare, and fancy a trip down memory lane to a youth spent destroying joysticks by playing Daley Thompson's Decathlon too vigorously or typing in listings from magazines only to get SYNTAX ERROR in line 4360 when you try and run them, buy this book.

Recommended.

Buy this book

At Amazon.co.uk
At Amazon.com

Posted by Anthony Williams
[/ reviews /] permanent link
Tags: ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Updated (again) Implementation of Futures for C++

Thursday, 15 May 2008

I have updated my prototype futures library implementation again, primarily to add documentation, but also to fix a few minor issues.

The new version is available for download, again under the Boost Software License. It still needs to be compiled against the Boost Subversion Trunk, as it uses the Boost Exception library, which is not available in an official boost release.

Sample usage can be seen in the test harness. The support for alternative allocators is still missing. The documentation for the futures library is available online, but is also included in the zip file.

Please download this prototype, put it through its paces, and let me know what you think.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Updated Implementation of Futures for C++

Sunday, 11 May 2008

I have updated my prototype futures library implementation in light of various comments received, and my own thoughts.

The new version is available for download, again under the Boost Software License. It still needs to be compiled against the Boost Subversion Trunk, as it uses the Boost Exception library, which is not available in an official boost release.

Sample usage can be seen in the test harness. The support for alternative allocators is still missing.

Changes

  • I have removed the try_get/timed_get functions, as they can be replaced with a combination of wait() or timed_wait() and get(), and they don't work with unique_future<R&> or unique_future<void>.
  • I've also removed the move() functions on unique_future. Instead, get() returns an rvalue-reference to allow moving in those types with move support. Yes, if you call get() twice on a movable type then the second get() returns an empty shell of an object, but I don't really think that's a problem: if you want to call get() multiple times, use a shared_future. I've implemented this with both rvalue-references and the boost.thread move emulation, so you can have a unique_future<boost::thread> if necessary. test_unique_future_for_move_only_udt() in test_futures.cpp shows this in action with a user-defined movable-only type X.
  • Finally, I've added a set_wait_callback() function to both promise and packaged_task. This allows for lazy-futures which don't actually run the operation to generate the value until the value is needed: no threading required. It also allows for a thread pool to do task stealing if a pool thread waits for a task that's not started yet. The callbacks must be thread-safe as they are potentially called from many waiting threads simultaneously. At the moment, I've specified the callbacks as taking a non-const reference to the promise or packaged_task for which they are set, but I'm open to just making them be any callable function, and leaving it up to the user to call bind() to do that.

I've left the wait operations as wait() and timed_wait(), but I've had a suggestion to use wait()/wait_for()/wait_until(), which I'm actively considering.

Please download this prototype, put it through its paces, and let me know what you think.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Free Implementation of Futures for C++ from N2561

Monday, 05 May 2008

I am happy to announce the release of a prototype futures library for C++ based on N2561. Packaged as a single header file released under the Boost Software License it needs to be compiled against the Boost Subversion Trunk, as it uses the Boost Exception library, which is not available in an official boost release.

Sample usage can be seen in the test harness. There is one feature missing, which is the support for alternative allocators. I intend to add such support in due course.

Please download this prototype, put it through its paces, and let me know what you think.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Bug Found in Boost.Thread (with Fix): Flaw in Condition Variable on Windows

Monday, 28 April 2008

There's a bug....

First the bad news: shortly after Boost 1.35.0 was released, a couple of users reported experiencing problems using boost::condition_variable on Windows: when they used notify_one()<\code>, sometimes their notifies disappeared, even when they knew there was a waiting thread.

... and now it's fixed

Next, the good news: I've found and fixed the bug, and committed the fix to the boost Subversion repository. If you can't update your boost implementation to trunk, you can download the new code and replace boost/thread/win32/condition_variable.hpp from the boost 1.35.0 distribution with the new version.

What was it?

For those of you interested in the details, this bug was in code related to detecting (and preventing) spurious wakes. When a condition variable was notified with notify_one(), the implementation was choosing one or more threads to compete for the notify. One of these would get the notification and return from wait(). Those that didn't get the notify were supposed to resume waiting without returning from wait(). Unfortunately, this left a potential gap where those threads weren't waiting, so would miss any calls to notify_one() that occurred before those threads resumed waiting.

The fix was to rewrite the wait/notify mechanism so this gap no longer exists, by changing the way that waiting threads are counted.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

The Future of Concurrency in C++: Slides from ACCU 2008

Monday, 07 April 2008

My presentation on The Future of Concurrency in C++ at ACCU 2008 last Thursday went off without a hitch. I was pleased to find that my talk was well attended, and the audience had lots of worthwhile questions — hopefully I answered them to everybody's satisfaction.

For those that didn't attend, or for those that did, but would like a reminder of what I said, here are the slides from my presentation.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Boost 1.35.0 has been Released!

Tuesday, 01 April 2008

Verson 1.35.0 of the Boost libraries was released on Saturday. This release includes a major revision of the Boost.Thread library, to bring it more in line with the C++0x Thread Library. There are many new libraries, and revisions to other libraries too, see the full Release Notes for details, or just Download the release and give it a try.

Posted by Anthony Williams
[/ news /] permanent link
Tags:
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Optimizing Applications with Fixed-Point Arithmetic

Tuesday, 01 April 2008

My latest article, Optimizing Math-intensive Applications with Fixed Point Arithmetic from the April 2008 issue of Dr Dobb's Journal is now available online. (I originally had "Maths-intensive" in the title, being English, but they dropped the "s", being American).

In the article, I describe the fixed-point techniques I used to vastly improve the performance of an application using sines, cosines and exponentials without hardware floating point support.

The source code referenced in the article can be downloaded from here. It is released under the Boost Software License.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Futures and Tasks in C++0x

Thursday, 27 March 2008

I had resigned myself to Thread Pools and Futures being punted to TR2 rather than C++0x, but it seems there is potential for some movement on this issue. At the meeting of WG21 in Kona, Hawaii in October 2007 it was agreed to include asynchronous future values in C++0x, whilst excluding thread pools and task launching.

Detlef Vollman has rekindled the effort, and drafted N2561: An Asynchronous Future Value with myself and Howard Hinnant, based on a discussion including other members of the Standards Committee. This paper proposes four templates: unique_future and shared_future, which are the asynchronous values themselves, and packaged_task and promise, which provide ways of setting the asynchronous values.

Asynchronous future values

unique_future is very much like unique_ptr: it represents exclusive ownership of the value. Ownership of a (future) value can be moved between unique_future instances, but no two unique_future instances can refer to the same asynchronous value. Once the value is ready for retrieval, it is moved out of the internal storage buffer: this allows for use with move-only types such as std::ifstream.

Similarly, shared_future is very much like shared_ptr: multiple instances can refer to the same (future) value, and shared_future instances can be copied around. In order to reduce surprises with this usage (with one thread moving the value through one instance at the same time as another tries to move it through another instance), the stored value can only be accessed via const reference, so must be copied out, or accessed in place.

Storing the future values as the return value from a function

The simplest way to calculate a future value is with a packaged_task<T>. Much like std::function<T()>, this encapsulates a callable object or function, for invoking at a later time. However, whereas std::function returns the result directly to the caller, packaged_task stores the result in a future.

    extern int some_function();
    std::packaged_task<int> task(some_function);
    std::unique_future<int> result=task.get_future();

    // later on, some thread does
    task();
    // and "result" is now ready

Making a promise to provide a future value

The other way to store a value to be picked up with a unique_future or shared_future is to use a promise, and then explicitly set the value by calling the set_value() member function.

    std::promise<int> my_promise;
    std::unique_future<int> result=my_promise.get_future();

    // later on, some thread does
    my_promise.set_value(42);
    // and "result" is now ready.

Exceptional returns

Futures also support storing exceptions: when you try and retrieve the value, if there is a stored exception, that exception is thrown rather than the value being retrieved. With a packaged_task, an exception gets stored if the wrapped function throws an exception when it is invoked, and with a promise, you can explicitly store an exception with the set_exception() member function.

Feedback

As the paper says, this is not a finished proposal: it is a basis for further discussion. Let me know if you have any comments.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Thread Interruption in the Boost Thread Library

Tuesday, 11 March 2008

One of the new features introduced in the upcoming 1.35.0 release of the boost thread library is support for interruption of a running thread. Similar to the Java and .NET interruption support, this allows for one thread to request another thread to stop at the next interruption point. This is the only way to explicitly request a thread to terminate that is directly supported by the Boost Thread library, though users can manually implement cooperative interruption if required.

Interrupting a thread in this way is much less dangerous than brute-force tactics such as TerminateThread(), as such tactics can leave broken invariants and leak resources. If a thread is killed using a brute-force method and it was holding any locks, this can also potentially lead to deadlock when another thread tries to acquire those locks at some future point. Interruption is also easier and more reliable than rolling your own cooperative termination scheme using mutexes, flags, condition variables, or some other synchronization mechanism, since it is part of the library.

Interrupting a Thread

A running thread can be interrupted by calling the interrupt() member function on the corresponding boost::thread object. If the thread doesn't have a boost::thread object (e.g the initial thread of the application), then it cannot be interrupted.

Calling interrupt() just sets a flag in the thread management structure for that thread and returns: it doesn't wait for the thread to actually be interrupted. This is important, because a thread can only be interrupted at one of the predefined interruption points, and it might be that a thread never executes an interruption point, so never sees the request. Currently, the interruption points are:

  • boost::thread::join()
  • boost::thread::timed_join()
  • boost::condition_variable::wait()
  • boost::condition_variable::timed_wait()
  • boost::condition_variable_any::wait()
  • boost::condition_variable_any::timed_wait()
  • boost::this_thread::sleep()
  • boost::this_thread::interruption_point()

When a thread reaches one of these interruption points, if interruption is enabled for that thread then it checks its interruption flag. If the flag is set, then it is cleared, and a boost::thread_interrupted exception is thrown. If the thread is already blocked on a call to one of the interruption points with interruption enabled when interrupt() is called, then the thread will wake in order to throw the boost::thread_interrupted exception.

Catching an Interruption

boost::thread_interrupted is just a normal exception, so it can be caught, just like any other exception. This is why the "interrupted" flag is cleared when the exception is thrown — if a thread catches and handles the interruption, it is perfectly acceptable to interrupt it again. This can be used, for example, when a worker thread that is processing a series of independent tasks — if the current task is interrupted, the worker can handle the interruption and discard the task, and move onto the next task, which can then in turn be interrupted. It also allows the thread to catch the exception and terminate itself by other means, such as returning error codes, or translating the exception to pass through module boundaries.

Disabling Interruptions

Sometimes it is necessary to avoid being interrupted for a particular section of code, such as in a destructor where an exception has the potential to cause immediate process termination. This is done by constructing an instance of boost::this_thread::disable_interruption. Objects of this class disable interruption for the thread that created them on construction, and restore the interruption state to whatever it was before on destruction:

    void f()
    {
        // interruption enabled here
        {
            boost::this_thread::disable_interruption di;
            // interruption disabled
            {
                boost::this_thread::disable_interruption di2;
                // interruption still disabled
            } // di2 destroyed, interruption state restored
            // interruption still disabled
        } // di destroyed, interruption state restored
        // interruption now enabled
    }

The effects of an instance of boost::this_thread::disable_interruption can be temporarily reversed by constructing an instance of boost::this_thread::restore_interruption, passing in the boost::this_thread::disable_interruption object in question. This will restore the interruption state to what it was when the boost::this_thread::disable_interruption object was constructed, and then disable interruption again when the boost::this_thread::restore_interruption object is destroyed:

    void g()
    {
        // interruption enabled here
        {
            boost::this_thread::disable_interruption di;
            // interruption disabled
            {
                boost::this_thread::restore_interruption ri(di);
                // interruption now enabled
            } // ri destroyed, interruption disabled again
            {
                boost::this_thread::disable_interruption di2;
                // interruption disabled
                {
                    boost::this_thread::restore_interruption ri2(di2);
                    // interruption still disabled
                    // as it was disabled when di2 constructed
                } // ri2 destroyed, interruption still disabled
            } //di2 destroyed, interruption still disabled
        } // di destroyed, interruption state restored
        // interruption now enabled
    }

boost::this_thread::disable_interruption and boost::this_thread::restore_interruption cannot be moved or copied, and they are the only way of enabling and disabling interruption. This ensures that the interruption state is correctly restored when the scope is exited (whether normally, or by an exception), and that you cannot enable interruptions in the middle of an interruption-disabled block unless you're in full control of the code, and have access to the boost::this_thread::disable_interruption instance.

At any point, the interruption state for the current thread can be queried by calling boost::this_thread::interruption_enabled().

Cooperative Interruption

As well as the interruption points on blocking operations such as sleep() and join(), there is one interruption point explicitly designed to allow interruption at a user-designated point in the code. boost::this_thread::interruption_point() does nothing except check for an interruption, and can therefore be used in long-running code that doesn't execute any other interruption points, in order to allow for cooperative interruption. Just like the other interruption points, interruption_point() respects the interruption enabled state, and does nothing if interruption is disabled for the current thread.

Interruption is Not Cancellation

On POSIX platforms, threads can be cancelled rather than killed, by calling pthread_cancel(). This is similar to interruption, but is a separate mechanism, with different behaviour. In particular, cancellation cannot be stopped once it is started: whereas interruption just throws an exception, once a cancellation request has been acknowledged the thread is effectively dead. pthread_cancel() does not always execute destructors either (though it does on some platforms), as it is primarily a C interface — if you want to clean up your resources when a thread is cancelled, you need to use pthread_cleanup_push() to register a cleanup handler. The advantage here is that pthread_cleanup_push() works in C stack frames, whereas exceptions don't play nicely in C: on some platforms it will crash your program for an exception to propagate into a C stack frame.

For portable code, I recommend interruption over cancellation. It's supported on all platforms that can use the Boost Thread library, and it works well with C++ code — it's just another exception, so all your destructors and catch blocks work just fine.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Acquiring Multiple Locks Without Deadlock

Monday, 03 March 2008

In a software system with lots of fine-grained mutexes, it can sometimes be necessary to acquire locks on more than one mutex together in order to perform some operation. If this is not done with care, then there is the possibility of deadlock, as multiple threads may lock the same mutexes in a different order. It is for this reason that the thread library coming with C++0x will include a lock() function for locking multiple mutexes together: this article describes the implementation details behind such a function.

Choose the lock order by role

The easiest way to deal with this is to always lock the mutexes in the same order. This is especially easy if the order can be hard-coded, and some uses naturally lend themselves towards this choice. For example, if the mutexes protect objects with different roles, it is relatively easy to always lock the mutex protecting one set of data before locking the other one. In such a situation, Lock hierarchies can be used to enforce the ordering — with a lock hierarchy, a thread cannot acquire a lock on a mutex with a higher hierarchy level than any mutexes currently locked by that thread.

If it is not possible to decide a-priori which mutex to lock first, such as when the mutexes are associated with the same sort of data, then a more complicated policy must be applied.

Choose the lock order by address

The simplest technique in these cases is to always lock the mutexes in ascending order of address (examples use the types and functions from the upcoming 1.35 release of Boost), like this:

void lock(boost::mutex& m1,boost::mutex& m2)
{
    if(&m1<&m2)
    {
        m1.lock();
        m2.lock();
    }
    else
    {
        m2.lock();
        m1.lock();
    }
}

This works for small numbers of mutexes, provided this policy is maintained throughout the application, but if several mutexes must be locked together, then calculating the ordering can get complicated, and potentially inefficient. It also requires that the mutexes are all of the same type. Since there are many possible mutex and lock types that an application might choose to use, this is a notable disadvantage, as the function must be written afresh for each possible combination.

Order mutexes "naturally", with try-and-back-off

If the mutexes cannot be ordered by address (for whatever reason), then an alternative scheme must be found. One such scheme is to use a try-and-back-off algorithm: try and lock each mutex in turn; if any cannot be locked, unlock the others and start again. The simplest implementation for 3 mutexes looks like this:

void lock(boost::mutex& m1,boost::mutex& m2,boost::mutex& m3)
{
    do
    {
        m1.lock();
        if(m2.try_lock())
        {
            if(m3.try_lock())
            {
                return;
            }
            m2.unlock();
        }
        m1.unlock();
    }
    while(true);
}

Wait for the failed mutex

The big problem with this scheme is that it always locks the mutexes in the same order. If m1 and m2 are currently free, but m3 is locked by another thread, then this thread will repeatedly lock m1 and m2, fail to lock m3 and unlock m1 and m2. This just wastes CPU cycles for no gain. Instead, what we want to do is block waiting for m3, and try to acquire the others only when m3 has been successfully locked by this thread. For three mutexes, a first attempt looks like this:

void lock(boost::mutex& m1,boost::mutex& m2,boost::mutex& m3)
{
    unsigned lock_first=0;
    while(true)
    {
        switch(lock_first)
        {
        case 0:
            m1.lock();
            if(m2.try_lock())
            {
                if(m3.try_lock())
                    return;
                lock_first=2;
                m2.unlock();
            }
            else
            {
                lock_first=1;
            }
            m1.unlock();
            break;
        case 1:
            m2.lock();
            if(m3.try_lock())
            {
                if(m1.try_lock())
                    return;
                lock_first=0;
                m3.unlock();
            }
            else
            {
                lock_first=2;
            }
            m2.unlock();
            break;
        case 2:
            m3.lock();
            if(m1.try_lock())
            {
                if(m2.try_lock())
                    return;
                lock_first=1;
                m1.unlock();
            }
            else
            {
                lock_first=0;
            }
            m3.unlock();
            break;
        }
    }
}

Simplicity and Robustness

This code is very long-winded, with all the duplication between the case blocks. Also, it assumes that the mutexes are all boost::mutex, which is overly restrictive. Finally, it assumes that the try_lock calls don't throw exceptions. Whilst this is true for the Boost mutexes, it is not required to be true in general, so a more robust implementation that allows the mutex type to be supplied as a template parameter will ensure that any exceptions thrown will leave all the mutexes unlocked: the unique_lock template will help with that by providing RAII locking. Taking all this into account leaves us with the following:

template<typename MutexType1,typename MutexType2,typename MutexType3>
unsigned lock_helper(MutexType1& m1,MutexType2& m2,MutexType3& m3)
{
    boost::unique_lock<MutexType1> l1(m1);
    boost::unique_lock<MutexType2> l2(m2,boost::try_to_lock);
    if(!l2)
    {
        return 1;
    }
    if(!m3.try_lock())
    {
        return 2;
    }
    l2.release();
    l1.release();
    return 0;
}

template<typename MutexType1,typename MutexType2,typename MutexType3>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3)
{
    unsigned lock_first=0;
    while(true)
    {
        switch(lock_first)
        {
        case 0:
            lock_first=lock_helper(m1,m2,m3);
            if(!lock_first)
                return;
            break;
        case 1:
            lock_first=lock_helper(m2,m3,m1);
            if(!lock_first)
                return;
            lock_first=(lock_first+1)%3;
            break;
        case 2:
            lock_first=lock_helper(m3,m1,m2);
            if(!lock_first)
                return;
            lock_first=(lock_first+2)%3;
            break;
        }
    }
}

This code is simultaneously shorter, simpler and more general than the previous implementation, and is robust in the face of exceptions. The lock_helper function locks the first mutex, and then tries to lock the other two in turn. If either of the try_locks fail, then all currently-locked mutexes are unlocked, and it returns the index of the mutex than couldn't be locked. On success, the release members of the unique_lock instances are called to release ownership of the locks, and thus stop them automatically unlocking the mutexes during destruction, and 0 is returned. The outer lock function is just a simple wrapper around lock_helper that chooses the order of the mutexes so that the one that failed to lock last time is tried first.

Extending to more mutexes

This scheme can also be easily extended to handle more mutexes, though the code gets unavoidably longer, since there are more cases to handle — this is where the C++0x variadic templates will really come into their own. Here's the code for locking 5 mutexes together:

template<typename MutexType1,typename MutexType2,typename MutexType3,
         typename MutexType4,typename MutexType5>
unsigned lock_helper(MutexType1& m1,MutexType2& m2,MutexType3& m3,
                     MutexType4& m4,MutexType5& m5)
{
    boost::unique_lock<MutexType1> l1(m1);
    boost::unique_lock<MutexType2> l2(m2,boost::try_to_lock);
    if(!l2)
    {
        return 1;
    }
    boost::unique_lock<MutexType3> l3(m3,boost::try_to_lock);
    if(!l3)
    {
        return 2;
    }
    boost::unique_lock<MutexType4> l2(m4,boost::try_to_lock);
    if(!l4)
    {
        return 3;
    }
    if(!m5.try_lock())
    {
        return 4;
    }
    l4.release();
    l3.release();
    l2.release();
    l1.release();
    return 0;
}

template<typename MutexType1,typename MutexType2,typename MutexType3,
         typename MutexType4,typename MutexType5>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3,
          MutexType4& m4,MutexType5& m5)
{
    unsigned const lock_count=5;
    unsigned lock_first=0;
    while(true)
    {
        switch(lock_first)
        {
        case 0:
            lock_first=lock_helper(m1,m2,m3,m4,m5);
            if(!lock_first)
                return;
            break;
        case 1:
            lock_first=lock_helper(m2,m3,m4,m5,m1);
            if(!lock_first)
                return;
            lock_first=(lock_first+1)%lock_count;
            break;
        case 2:
            lock_first=lock_helper(m3,m4,m5,m1,m2);
            if(!lock_first)
                return;
            lock_first=(lock_first+2)%lock_count;
            break;
        case 3:
            lock_first=lock_helper(m4,m5,m1,m2,m3);
            if(!lock_first)
                return;
            lock_first=(lock_first+3)%lock_count;
            break;
        case 4:
            lock_first=lock_helper(m5,m1,m2,m3,m4);
            if(!lock_first)
                return;
            lock_first=(lock_first+4)%lock_count;
            break;
        }
    }
}

Final Code

The final code for acquiring multiple locks provides try_lock and lock functions for 2 to 5 mutexes. Though the try_lock functions are relatively straight-forward, their existence makes the lock_helper functions slightly simpler, as they can just defer to the appropriate overload of try_lock to cover all the mutexes beyond the first one.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Thread Library Now in C++0x Working Draft

Monday, 11 February 2008

The latest proposal for the C++ standard thread library has finally made it into the C++0x working draft.

Woohoo!

There will undoubtedly be minor changes as feedback comes in to the committee, but this is the first real look at what C++0x thread support will entail, as approved by the whole committee. The working draft also includes the new C++0x memory model, and atomic types and operations. This means that for the first time, C++ programs will legitimately be able to spawn threads without immediately straying into undefined behaviour. Not only that, but the memory model has been very carefully thought out, so it should be possible to write even low-level stuff such as lock-free containers in Standard C++.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Implementing a Thread-Safe Queue using Condition Variables

Monday, 04 February 2008

One problem that comes up time and again with multi-threaded code is how to transfer data from one thread to another. For example, one common way to parallelize a serial algorithm is to split it into independent chunks and make a pipeline — each stage in the pipeline can be run on a separate thread, and each stage adds the data to the input queue for the next stage when it's done. For this to work properly, the input queue needs to be written so that data can safely be added by one thread and removed by another thread without corrupting the data structure.

Basic Thread Safety with a Mutex

The simplest way of doing this is just to put wrap a non-thread-safe queue, and protect it with a mutex (the examples use the types and functions from the upcoming 1.35 release of Boost):

template<typename Data>
class concurrent_queue
{
private:
    std::queue<Data> the_queue;
    mutable boost::mutex the_mutex;
public:
    void push(const Data& data)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        the_queue.push(data);
    }

    bool empty() const
    {
        boost::mutex::scoped_lock lock(the_mutex);
        return the_queue.empty();
    }

    Data& front()
    {
        boost::mutex::scoped_lock lock(the_mutex);
        return the_queue.front();
    }
    
    Data const& front() const
    {
        boost::mutex::scoped_lock lock(the_mutex);
        return the_queue.front();
    }

    void pop()
    {
        boost::mutex::scoped_lock lock(the_mutex);
        the_queue.pop();
    }
};

This design is subject to race conditions between calls to empty, front and pop if there is more than one thread removing items from the queue, but in a single-consumer system (as being discussed here), this is not a problem. There is, however, a downside to such a simple implementation: if your pipeline stages are running on separate threads, they likely have nothing to do if the queue is empty, so they end up with a wait loop:

    while(some_queue.empty())
    {
        boost::this_thread::sleep(boost::posix_time::milliseconds(50));
    }

Though the sleep avoids the high CPU consumption of a direct busy wait, there are still some obvious downsides to this formulation. Firstly, the thread has to wake every 50ms or so (or whatever the sleep period is) in order to lock the mutex, check the queue, and unlock the mutex, forcing a context switch. Secondly, the sleep period imposes a limit on how fast the thread can respond to data being added to the queue — if the data is added just before the call to sleep, the thread will wait at least 50ms before checking for data. On average, the thread will only respond to data after about half the sleep time (25ms here).

Waiting with a Condition Variable

As an alternative to continuously polling the state of the queue, the sleep in the wait loop can be replaced with a condition variable wait. If the condition variable is notified in push when data is added to an empty queue, then the waiting thread will wake. This requires access to the mutex used to protect the queue, so needs to be implemented as a member function of concurrent_queue:

template<typename Data>
class concurrent_queue
{
private:
    boost::condition_variable the_condition_variable;
public:
    void wait_for_data()
    {
        boost::mutex::scoped_lock lock(the_mutex);
        while(the_queue.empty())
        {
            the_condition_variable.wait(lock);
        }
    }
    void push(Data const& data)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        bool const was_empty=the_queue.empty();
        the_queue.push(data);
        if(was_empty)
        {
            the_condition_variable.notify_one();
        }
    }
    // rest as before
};

There are three important things to note here. Firstly, the lock variable is passed as a parameter to wait — this allows the condition variable implementation to atomically unlock the mutex and add the thread to the wait queue, so that another thread can update the protected data whilst the first thread waits.

Secondly, the condition variable wait is still inside a while loop — condition variables can be subject to spurious wake-ups, so it is important to check the actual condition being waited for when the call to wait returns.

Be careful when you notify

Thirdly, the call to notify_one comes after the data is pushed on the internal queue. This avoids the waiting thread being notified if the call to the_queue.push throws an exception. As written, the call to notify_one is still within the protected region, which is potentially sub-optimal: the waiting thread might wake up immediately it is notified, and before the mutex is unlocked, in which case it will have to block when the mutex is reacquired on the exit from wait. By rewriting the function so that the notification comes after the mutex is unlocked, the waiting thread will be able to acquire the mutex without blocking:

template<typename Data>
class concurrent_queue
{
public:
    void push(Data const& data)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        bool const was_empty=the_queue.empty();
        the_queue.push(data);

        lock.unlock(); // unlock the mutex

        if(was_empty)
        {
            the_condition_variable.notify_one();
        }
    }
    // rest as before
};

Reducing the locking overhead

Though the use of a condition variable has improved the pushing and waiting side of the interface, the interface for the consumer thread still has to perform excessive locking: wait_for_data, front and pop all lock the mutex, yet they will be called in quick succession by the consumer thread.

By changing the consumer interface to a single wait_and_pop function, the extra lock/unlock calls can be avoided:

template<typename Data>
class concurrent_queue
{
public:
    void wait_and_pop(Data& popped_value)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        while(the_queue.empty())
        {
            the_condition_variable.wait(lock);
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
    }

    // rest as before
};

Using a reference parameter to receive the result is used to transfer ownership out of the queue in order to avoid the exception safety issues of returning data by-value: if the copy constructor of a by-value return throws, then the data has been removed from the queue, but is lost, whereas with this approach, the potentially problematic copy is performed prior to modifying the queue (see Herb Sutter's Guru Of The Week #8 for a discussion of the issues). This does, of course, require that an instance Data can be created by the calling code in order to receive the result, which is not always the case. In those cases, it might be worth using something like boost::optional to avoid this requirement.

Handling multiple consumers

As well as removing the locking overhead, the combined wait_and_pop function has another benefit — it automatically allows for multiple consumers. Whereas the fine-grained nature of the separate functions makes them subject to race conditions without external locking (one reason why the authors of the SGI STL advocate against making things like std::vector thread-safe — you need external locking to do many common operations, which makes the internal locking just a waste of resources), the combined function safely handles concurrent calls.

If multiple threads are popping entries from a full queue, then they just get serialized inside wait_and_pop, and everything works fine. If the queue is empty, then each thread in turn will block waiting on the condition variable. When a new entry is added to the queue, one of the threads will wake and take the value, whilst the others keep blocking. If more than one thread wakes (e.g. with a spurious wake-up), or a new thread calls wait_and_pop concurrently, the while loop ensures that only one thread will do the pop, and the others will wait.

There is one benefit that the separate functions give over the combined one — the ability to check for an empty queue, and do something else if the queue is empty. empty itself still works in the presence of multiple consumers, but the value that it returns is transitory — there is no guarantee that it will still apply by the time a thread calls wait_and_pop, whether it was true or false. For this reason it is worth adding an additional function: try_pop, which returns true if there was a value to retrieve (in which case it retrieves it), or false to indicate that the queue was empty.

template<typename Data>
class concurrent_queue
{
public:
    bool try_pop(Data& popped_value)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        if(the_queue.empty())
        {
            return false;
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
        return true;
    }

    // rest as before
};

By removing the separate front and pop functions, our simple naive implementation has now become a usable multiple producer, multiple consumer concurrent queue.

The Final Code

Here is the final code for a simple thread-safe multiple producer, multiple consumer queue:

template<typename Data>
class concurrent_queue
{
private:
    std::queue<Data> the_queue;
    mutable boost::mutex the_mutex;
    boost::condition_variable the_condition_variable;
public:
    void push(Data const& data)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        bool const was_empty=the_queue.empty();
        the_queue.push(data);

        lock.unlock();

        if(was_empty)
        {
            the_condition_variable.notify_one();
        }
    }

    bool empty() const
    {
        boost::mutex::scoped_lock lock(the_mutex);
        return the_queue.empty();
    }

    bool try_pop(Data& popped_value)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        if(the_queue.empty())
        {
            return false;
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
        return true;
    }

    void wait_and_pop(Data& popped_value)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        while(the_queue.empty())
        {
            the_condition_variable.wait(lock);
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
    }

};

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , ,
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone

Database Tip: Eliminate Duplicate Data

Friday, 25 January 2008

Storing duplicated data in your database is a bad idea for several reasons:

  1. The duplicated data occupies more space — if you store two copies of the same data in your database, it takes twice as much space.
  2. If duplicated data is updated, it must be changed in more than one place, which is more complex and may require more code than just changing it in one location.
  3. Following on from the previous point — if data is duplicated, then it is easy to miss one of the duplicates when updating, leading to different copies having different information. This may lead to confusion, and errors further down the line.

Coincidental Duplication

It is worth noting that some duplication is coincidental — it is worth checking out whether a particular instance of duplication is coincidental or not before eliminating it. For example it is common for a billing address to be the same as a delivery addre