Condition Variable Spurious Wakes

Friday, 27 June 2008

Condition variables are a useful mechanism for waiting until an event occurs or some "condition" is satisfied. For example, in my implementation of a thread-safe queue I use a condition variable to avoid busy-waiting in wait_and_pop() when the queue is empty. However, condition variables have one "feature" which is a common source of bugs: a wait on a condition variable may return even if the condition variable has not been notified. This is called a spurious wake.

Spurious wakes cannot be predicted: they are essentially random from the user's point of view. However, they commonly occur when the thread library cannot reliably ensure that a waiting thread will not miss a notification. Since a missed notification would render the condition variable useless, the thread library wakes the thread from its wait rather than take the risk.

Bugs due to spurious wakes

Consider the code for wait_and_pop from my thread-safe queue:

    void wait_and_pop(Data& popped_value)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        while(the_queue.empty())
        {
            the_condition_variable.wait(lock);
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
    }

If we know that there's only one consumer thread, it would be tempting to write this with an if instead of a while, on the assumption that there's only one thread waiting, so if it's been notified, the queue must not be empty:

    if(the_queue.empty()) // Danger, Will Robinson
    {
        the_condition_variable.wait(lock);
    }

With the potential of spurious wakes this is not safe: the wait might finish even if the condition variable was not notified. We therefore need the while, which has the added benefit of allowing multiple consumer threads: we don't need to worry that another thread might remove the last item from the queue, since we're checking to see if the queue is empty before proceeding.

That's the beginner's bug, and one that's easily overcome with a simple rule: always check your predicate in a loop when waiting with a condition variable. The more insidious bug comes from timed_wait().

Timing is everything

condition_variable::wait() has a companion function that allows the user to specify a time limit on how long they're willing to wait: condition_variable::timed_wait(). This function comes as a pair of overloads: one that takes an absolute time, and one that takes a duration. The absolute time overload will return once the clock reaches the specified time, whether or not it was notified. The duration overload will return once the specified duration has elapsed: if you say to wait for 3 seconds, it will stop waiting after 3 seconds. The insidious bug comes from the overload that takes a duration.

Suppose we wanted to add a timed_wait_and_pop() function to our queue, that allowed the user to specify a duration to wait. We might be tempted to write it as:

    template<typename Duration>
    bool timed_wait_and_pop(Data& popped_value,
                            Duration const& timeout)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        while(the_queue.empty())
        {
            if(!the_condition_variable.timed_wait(lock,timeout))
                return false;
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
        return true;
    }

At first glance this looks fine: we're handling spurious wakes by looping on the timed_wait() call, and we're passing the timeout in to that call. Unfortunately, the timeout is a duration, so every call to timed_wait() will wait up to the specified amount of time. If the timeout was 1 second, and the timed_wait() call woke due to a spurious wake after 0.9 seconds, the next time round the loop would wait for a further 1 second. In theory this could continue ad infinitum, completely defeating the purpose of using timed_wait() in the first place.

The solution is simple: use the absolute time overload instead. By specifying a particular clock time as the timeout, the remaining wait time decreases with each call. This requires that we determine the final timeout prior to the loop:

    template<typename Duration>
    bool timed_wait_and_pop(Data& popped_value,
                            Duration const& wait_duration)
    {
        boost::system_time const timeout=boost::get_system_time()+wait_duration;

        boost::mutex::scoped_lock lock(the_mutex);
        while(the_queue.empty())
        {
            if(!the_condition_variable.timed_wait(lock,timeout))
                return false;
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
        return true;
    }

Though this solves the problem, it's easy to make the mistake. Thankfully, there is a better way to wait that doesn't suffer from this problem: pass the predicate to the condition variable.

Passing the predicate to the condition variable

Both wait() and timed_wait() come with additional overloads that allow the user to specify the condition being waited for as a predicate. These overloads encapsulate the while loops from the examples above, and ensure that spurious wakes are correctly handled. All that is required is that the condition being waited for can be checked by means of a simple function call or a function object which is passed as an additional parameter to the wait() or timed_wait() call.

wait_and_pop() can therefore be written like this:


    struct queue_not_empty
    {
        std::queue<Data>& queue;

        queue_not_empty(std::queue<Data>& queue_):
            queue(queue_)
        {}
        bool operator()() const
        {
            return !queue.empty();
        }
    };

    void wait_and_pop(Data& popped_value)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        the_condition_variable.wait(lock,queue_not_empty(the_queue));
        popped_value=the_queue.front();
        the_queue.pop();
    }

and timed_wait_and_pop() can be written like this:

    template<typename Duration>
    bool timed_wait_and_pop(Data& popped_value,
                            Duration const& wait_duration)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        if(!the_condition_variable.timed_wait(lock,wait_duration,
            queue_not_empty(the_queue)))
            return false;
        popped_value=the_queue.front();
        the_queue.pop();
        return true;
    }

Note that what we're waiting for is the queue not to be empty — the predicate is the reverse of the condition we would put in the while loop. This will be much easier to specify when compilers implement the C++0x lambda facilities.

Conclusion

Spurious wakes can cause some unfortunate bugs, which are hard to track down due to the unpredictability of spurious wakes. These problems can be avoided by ensuring that plain wait() calls are made in a loop, and the timeout is correctly calculated for timed_wait() calls. If the predicate can be packaged as a function or function object, using the predicated overloads of wait() and timed_wait() avoids all the problems.

Posted by Anthony Williams
[/ threading /] permanent link

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

4 Comments

Hello,

Thanks for your note on using condition variables, but is there any way to reproduce "Spurious Wakes" to test the code for such issues?.

PS: thanks for writing me back on wasinapple@gmail.com -Thanks Harish
by Harish V Kulkarni at 08:10:09 on Wednesday, 06 August 2008

There currently isn't a way to make the condition variable wake spuriously, but you could spawn a thread that sits there doing nothing but broadcasting the condition variable (thus causing waits to wake when the predicate hasn't changed).

Alternatively, you could temporarily replace the wait call with unlock/sleep/lock. In the presence of spurious wakes, this is essentially all you can rely on from wait() anyway. If you systematically do this, you might spot the problems even before you try testing the code, as it becomes apparent that variables don't magically change state just because your thread went to sleep for a while.

by Anthony Williams at 09:24:35 on Wednesday, 13 August 2008

(Although the original post is over 2 years old, I'll still give it a shot)

I think that all of these are in a way incorrect, as the constructor for the local `boost::mutex::scoped_lock` variable will not time out correctly.

template<typename Duration> bool timed_wait_and_pop(Data& popped_value, Duration const& wait_duration) { boost::system_time const timeout=boost::get_system_time()+wait_duration;

boost::mutex::scoped_lock lock(the_mutex, timeout); // issues a timed_wait internally if(!lock) // passed duration? return false; // already too late. if(!the_condition_variable.timed_wait(lock,timeout, queue_not_empty(the_queue))) return false; popped_value=the_queue.front(); the_queue.pop(); return true; }
by Anders Dalvander at 14:20:38 on Thursday, 16 December 2010

(Although the original post is over 2 years old, I'll still give it a shot)

I think that all of these are in a way incorrect, as the constructor for the local `boost::mutex::scoped_lock` variable will not time out correctly.

template<typename Duration> bool timed_wait_and_pop(Data& popped_value, Duration const& wait_duration) { boost::system_time const timeout=boost::get_system_time()+wait_duration;

boost::mutex::scoped_lock lock(the_mutex, timeout); // issues a timed_wait internally if(!lock) // passed duration? return false; // already too late. if(!the_condition_variable.timed_wait(lock,timeout, queue_not_empty(the_queue))) return false; popped_value=the_queue.front(); the_queue.pop(); return true; }
by Anders Dalvander at 09:25:15 on Monday, 20 December 2010

Add your comment

Your name:

Your URL:

Email address:

Person or spambot?

Your comment: