Blog Archive

Implementing Dekker's algorithm with Fences

Tuesday, 27 July 2010

Dekker's algorithm is one of the most basic algorithms for mutual exclusion, alongside Peterson's algorithm and Lamport's bakery algorithm. It has the nice property that it only requires load and store operations rather than exchange or test-and-set, but it still requires some level of ordering between the operations. On a weakly-ordered architecture such as PowerPC or SPARC, a correct implementation of Dekker's algorithm thus requires the use of fences or memory barriers in order to ensure correct operation.

The code

For those of you who just want the code: here it is — Dekker's algorithm in C++, with explicit fences.

std::atomic<bool> flag0(false),flag1(false);
std::atomic<int> turn(0);

void p0()
{
    flag0.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag1.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 0)
        {
            flag0.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 0)
            {
            }
            flag0.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);
 
    // critical section


    turn.store(1,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag0.store(false,std::memory_order_relaxed);
}

void p1()
{
    flag1.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag0.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 1)
        {
            flag1.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 1)
            {
            }
            flag1.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);
 
    // critical section


    turn.store(0,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag1.store(false,std::memory_order_relaxed);
}

The analysis

If you're like me then you'll be interested in why stuff works, rather than just taking the code. Here is my analysis of the required orderings, and how the fences guarantee those orderings.

Suppose thread 0 and thread 1 enter p0 and p1 respectively at the same time. They both set their respective flags to true, execute the fence and then read the other flag at the start of the while loop.

If both threads read false then both will enter the critical section, and the algorithm doesn't work. It is the job of the fences to ensure that this doesn't happen.

The fences are marked with memory_order_seq_cst, so either the fence in p0 is before the fence in p1 in the global ordering of memory_order_seq_cst operations, or vice-versa. Without loss of generality, we can assume that the fence in p0 comes before the fence in p1, since the code is symmetric. The store to flag0 is sequenced before the fence in p0, and the fence in p1 is sequenced before the read from flag0. Therefore the read from flag0 must see the value stored (true), so p1 will enter the while loop.

On the other side, there is no such guarantee for the read from flag1 in p0, so p0 may or may not enter the while loop. If p0 reads the value of false for flag1, it will not enter the while loop, and will instead enter the critical section, but that is OK since p1 has entered the while loop.

Though flag0 is not set to false until p0 has finished the critical section, we need to ensure that p1 does not see this until the values modified in the critical section are also visible to p1, and this is the purpose of the release fence prior to the store to flag0 and the acquire fence after the while loop. When p1 reads the value false from flag0 in order to exit the while loop, it must be reading the value store by p0 after the release fence at the end of the critical section. The acquire fence after the load guarantees that all the values written before the release fence prior to the store are visible, which is exactly what we need here.

If p0 reads true for flag1, it will enter the while loop rather than the critical section. Both threads are now looping, so we need a way to ensure that exactly one of them breaks out. This is the purpose of the turn variable. Initially, turn is 0, so p1 will enter the if and set flag1 to false, whilst p1 will not enter the if. Because p1 set flag1 to false, eventually p0 will read flag1 as false and exit the outer while loop to enter the critical section. On the other hand, p1 is now stuck in the inner while loop because turn is 0. When p0 exits the critical section it sets turn to 1. This will eventually be seen by p1, allowing it to exit the inner while loop. When the store to flag0 becomes visible p1 can then exit the outer while loop too.

If turn had been 1 initially (because p0 was the last thread to enter the critical section) then the inverse logic would apply, and p0 would enter the inner loop, allowing p1 to enter the critical section first.

Second time around

If p0 is called a second time whilst p1 is still in the inner loop then we have a similar situation to the start of the function — p1 may exit the inner loop and store true in flag1 whilst p0 stores true in flag0. We therefore need the second memory_order_seq_cst fence after the store to the flag in the inner loop. This guarantees that at least one of the threads will see the flag from the other thread set when it executes the check in the outer loop. Without this fence then both threads can read false, and both can enter the critical section.

Alternatives

You could put the ordering constraints on the loads and stores themselves rather than using fences. Indeed, the default memory ordering for atomic operations in C++ is memory_order_seq_cst, so the algorithm would "just work" with plain loads and stores to atomic variables. However, by using memory_order_relaxed on the loads and stores we can add fences to ensure we have exactly the ordering constraints required.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Reference Wrappers Explained

Wednesday, 14 July 2010

The upcoming C++0x standard includes reference wrappers in the form of the std::reference_wrapper<T> class template, and the helper function templates std::ref() and std::cref(). As I mentioned in my blog post on Starting Threads with Member Functions and Reference Arguments, these wrappers can be used to pass references to objects across interfaces that normally require copyable (or at least movable) objects — in that blog post, std::ref was used for passing references to objects over to the new thread, rather than copying the objects. I was recently asked what the difference was between std::ref and std::cref, and how they worked, so I thought I'd elaborate.

Deducing the Referenced Type

std::ref is a function template, so automatically deduces the type of the wrapped reference from the type of the supplied argument. This type deduction includes the const-ness of the supplied object:

int x=3;
const int y=4;
std::reference_wrapper<int> rx=std::ref(x);
// std::reference_wrapper<int> ry=std::ref(y); // error
std::reference_wrapper<const int> rcy=std::ref(y);

On the other hand, though std::cref also deduces the type of the wrapped reference from the supplied argument, it always wraps a const reference:

int x=3;
const int y=4;
// std::reference_wrapper<int> rx=std::cref(x); // error
std::reference_wrapper<const int> rcx=std::cref(x);
// std::reference_wrapper<int> ry=std::cref(y); // error
std::reference_wrapper<const int> rcy=std::cref(y);

Since a no-const-reference can always be bound to a const reference, you can thus use std::ref in pretty much every case where you would use std::cref, and your code would work the same. Which begs the question: why would you ever choose to use std::cref?

Using std::cref to prevent modification

The primary reason for choosing std::cref is because you want to guarantee that the source object is not modified through that reference. This can be important when writing multithreaded code — if a thread should not be modifying some data then it can be worth enforcing this by passing a const reference rather than a mutable reference.

void foo(int&); // mutable reference

int x=42; // Should not be modified by thread
std::thread t(foo,std::cref(x)); // will fail to compile

This can be important where there are overloads of a function such that one takes a const reference, and the other a non-const reference: if we don't want the object modified then it is important that the overload taking a const reference is chosen.

struct Foo
{
    void operator()(int&) const;
    void operator()(int const&) const;
};

int x=42;
std::thread(Foo(),std::cref(x)); // force const int& overload 

References to temporaries

std::cref has another property missing from std::ref — it can bind to temporaries, since temporary objects can bind to const references. I'm not sure this is a good thing, especially when dealing with multiple threads, as the referenced temporary is likely to have been destroyed before the thread has even started. This is therefore something to watch out for:

void bar(int const&);

std::thread t(bar,std::cref(42)); // oops, ref to temporary

Documentation

Finally, std::cref serves a documentation purpose, even where std::ref would suffice — it declares in big bold letters that this reference cannot be used to modify the referenced object, which thus makes it easier to reason about the code.

Recommendation

I would recommend that you use std::cref in preference to std::ref whenever you can — the benefits as documentation of intent, and avoiding accidental modification through the reference make it a clear winner in my opinion. Of course, if you do want to modify the referenced object, then you need to use std::ref, but such usage now stands out, and makes it clear that this is the intent.

You do still need to be careful to ensure that you don't try and wrap references to temporaries, particularly when applying std::cref to the result of a function call, but such uses should stand out — I expect most uses to be wrapping a reference to a named variable rather than wrapping a function result.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Last day for comments on the C++0x FCD

Thursday, 17 June 2010

The BSI deadline for comments on the C++0x FCD is tomorrow, Friday 18th June 2010. The ISO deadline is 26th July 2010, but we have to write up comments for submission in the form required for ISO, which takes time.

If you have a comment on the FCD, please see my earlier blog post for how to submit it to BSI. Help us make the C++0x standard as good as it can be.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Enforcing Correct Mutex Usage with Synchronized Values

Friday, 28 May 2010

My latest article, Enforcing Correct Mutex Usage with Synchronized Values has been published on the Dr Dobb's website.

This article expands on the SynchronizedValue<T> template I mentioned in my presentation on Concurrency in the Real World at ACCU 2010, and deals with the problem of ensuring that the mutex associated with some data is locked whenever the data is accessed.

The basic idea is that you use SynchronizedValue<T> wherever you have an object of type T that you wish to be protected with its own mutex. The SynchronizedValue<T> then behaves like a pointer-to-T for simple uses.

Read the article for the full details.

Posted by Anthony Williams
[/ news /] permanent link
Tags: ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

just::thread C++0x Thread Library V1.4 (FCD Edition) Released

Thursday, 06 May 2010

I am pleased to announce that version 1.4 (the FCD edition) of just::thread, our C++0x Thread Library has just been released.

With the release of the "FCD edition", just::thread provides the first complete implementation of the multithreading facilities from the Final Committee Draft (FCD) of the C++0x standard.

Changes include:

Purchase your copy and get started with the C++0x thread library NOW.

As usual, existing customers are entitled to a free upgrade to V1.4.0 from all earlier versions.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

"Concurrency in the Real World" slides now available

Monday, 19 April 2010

The slides for my presentation on "Concurrency in the Real World" at the ACCU 2010 conference last week are now available.

The room was full, and quite warm due to the air conditioning having been turned off, but everything went to plan, and there were some insightful questions from the audience. I've thoroughly enjoyed presenting at ACCU in previous years, and this was no exception.

I covered the main pitfalls people encounter when writing multithreaded code, along with some techniques that I've found help deal with those problems, including some example code from projects I've worked on. As you might expect, all my examples were in C++, though the basic ideas are cross-language. I finished up by talking about what we might hope to get out of multithreaded code, such as performance, additional features and responsiveness.

There's a discount on my just::thread library until Friday 23rd April 2010, so if you're doing concurrency in C++ with Microsoft Visual Studio on Windows or g++ on linux get yourself a copy whilst it's on offer and start taking advantage of the new C++0x thread library.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

March 2010 C++ Standards Committee Mailing

Thursday, 08 April 2010

The March 2010 mailing for the C++ Standards Committee was published last week. This is the post-meeting mailing for the March 2010 committee meeting, and contains the C++0x Final Committee Draft, which I blogged about last week.

There are 6 concurrency-related papers (of which my name is on two), which I summarize below:

Concurrency-related papers

N3057: Explicit Initializers for Atomics

This paper proposes new initializers for atomic variables, providing a means of writing code which can be compiled as either C or C++. e.g.

void foo()
{
    atomic_int a=ATOMIC_VAR_INIT(42); // initialize a with 42
    atomic_uint b;                    // uninitialized
    atomic_init(&b,123);          // b now initialized to 123
}
N3058: Futures and Async Cleanup (Rev.)

This is a revision of N3041 to resolve many of the outstanding issues with futures and async. Mostly it's just wordsmithing to tidy up the specification, but there's a few key changes:

  • Defined behaviour for the wait_for() and wait_until() member functions of std::future, std::shared_future and std::atomic_future when used with std::async and a launch policy of std::launch::sync. The return value is now a value of the new std::future_status enumeration, and can be std::future_status::ready if the future becomes ready before the timeout, std::future_status::timeout if the wait times out, or std::future_status::deferred if the future comes from a call to std::async with a launch policy of std::launch::sync and the function associated with the future hasn't yet started execution on any thread.
  • The wording for std::async adopts the same wording as std::thread to clarify the copy/move and perfect forwarding semantics of the call.
N3069: Various threads issues in the library (LWG 1151)

This is a revision of N3040, and highlights which operations through iterators constitute accesses and data races, and explicitly allows for synchronization by writing and reading to/from a stream.

N3070: Handling Detached Threads and thread_local Variables

This is a hugely simplified replacement for my previous paper N3038. Rather than creating contexts for thread_local variables, this paper proposes new member functions for std::promise and std::packaged_task to allow the value to be set at the point of call, but threads waiting on associated futures to be woken only after thread_local variables have been destroyed at thread exit. This means that you can now safely wait on a future which is set in such a fashion when waiting for a task running on a background thread to complete, without having to join with the thread or worry about races arising from the destructors of thread_local variables. The paper also adds a similar mechanism for condition variables as a non-member function.

N3071: Renaming launch::any and what asyncs really might be (Rev.)

This is a revision of N3042 proposing renaming std::launch::any to std::launch::sync_or_async. This paper was not approved.

N3074: Updates to C++ Memory Model Based on Formalization

This is a revision of N3045. This paper proposes some changes to the wording of the memory model in order to ensure that it means what we intended it to mean.

Other Papers

There's several non-concurrency papers in the mailing as well as the standard set (working draft, agenda, issues lists, etc.). The most significant of these in my view are the following 3 papers. Check the mailing for the full set.

N3050: Allowing Move Constructors to Throw (Rev. 1)

This paper adds the new noexcept keyword to C++. This is used in place of an exception specification. On its own it means that the function does not throw any exceptions, but it can also be used with a boolean constant expression where true means that the function doesn't throw, and false means that it might. e.g.

void foo() noexcept;        // will not throw
void bar() noexcept(true);  // will not throw
void baz() noexcept(false); // may throw

If a noexcept exception specification is violated then std::terminate() is called.

The primary benefit from the boolean-constant-expression version is in templates, where the boolean expression can check various properties of the template parameter types. One of the things you can check is whether or not particular operations throw, e.g. by using the new has_nothrow_move_constructor type trait to declare the move constructor for a class to be noexcept if its class members have non-throwing move constructors:

template<typename T>
class X
{
    T data;
public:
    X(X&& other)
        noexcept(std::has_nothrow_move_constructor<T>::value):
        data(std::move(other.data))
    {}
};
N3053: Defining Move Special Member Functions

This proposal ensures that user-defined classes have move constructors and move assignment operators generated for them by the compiler if that is safe. Explicitly declaring a copy or move constructor will prevent the implicit declaration of the other, and likewise for copy and move assignment. You can always request the default definition using the = default syntax.

This means that lots of user code will now be able to readily take advantage of move semantics with a simple code change or even just a recompile. This can potentially be of major performance benefit.

N3055: A Taxonomy of Expression Value Categories

This paper nails down the true distinctions between lvalues, rvalues and rvalue references. It provides a new set of names to identify the distinct categories of values in C++ — lvalues and rvalues we already have, but now there's xvalues, prvalues and glvalues too. This categorization allows for better specification of when things can bind to lvalue references or rvalue references, when the compiler can eliminate copies or moves.

Please comment on the FCD

The purpose of the C++0x Final Committee Draft is to get comments prior to publication to ensure the final C++0x standard is as defect free as possible. This opportunity is only available for a limited time, so please comment on the FCD.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Sign up for a 50% discount just::thread FCD edition

Wednesday, 07 April 2010

I'm in the process of updating our C++0x thread library for VS2008, VC10, g++ 4.3 and g++ 4.4 to incorporate the changes to the C++0x thread library voted into the C++0x FCD. I'll be writing a blog post with more details in due course, but the big changes are:

Existing customers will get the new version as a free upgrade, but the rest of you can get a 50% discount if you subscribe to my blog by email. Just fill in your name and email address in the form below and be sure to click the confirmation link. You'll then receive future blog posts by email, along with an announcement and exclusive discount for the FCD edition of just::thread when it's released.

If you're reading this via RSS and your reader doesn't show you the form or doesn't allow you to submit your details, then please go to the web version of this blog entry.

If you've already subscribed by email then you don't need to subscribe again, you'll automatically receive the discount code.

Please sign me up to receive blog posts by email and an exclusive discount on just::thread.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

C++0x Final Committee Draft Published - Please Comment

Friday, 02 April 2010

Earlier this week, the Final Committee Draft (FCD) of the C++0x standard was published. This means that C++0x is now in the final stages of bug fixing and wordsmithing before publication. If all goes to plan, the draft will move to Final Draft International Standard (FDIS) early in 2011, and will be a new standard by the end of 2011.

The publication of the FCD means that the draft standard has now been officially put up for review by the national standards bodies of ISO's member countries. The British Standards Institution is one of several national bodies that is actively involved in the standardisation of the C++ language. The panel members of the C++ Committee of the BSI, IST 5/-/21, are currently compiling a list of comments on the FCD. We intend to submit these as the BSI's National Body comments, aimed at getting issues with the FCD addressed before it becomes the new international standard for C++.

We're welcoming additional comments, and would like to provide a channel for anyone who may be interested in the C++0x Standard, but not able to be fully involved in the standards process, to submit comments. Note that not all comments — regardless of whether they are submitted by panel members or non-members — will go forward.

Here is some guidance on what we are looking for:

  • Suggestions for how to improve the clarity of the wording, even if that's just by adding a cross-reference to a relevant paragraph elsewhere;
  • Comments that identify any under/over specification; and
  • Comments highlighting inconsistencies or contradictions in the draft text.

Comments should be specific and preferably should include suggested updated wording (and if you need help formulating updated wording we can provide it, within reason) — the C++ standards committee is working to a very tight schedule in order to get C++0x out as soon as possible, and comments without wording (which therefore require more work from the committee) are more likely to be rejected.

The time for adding/removing features has now passed, so comments should focus on improving the draft as it stands rather than suggesting new features.

Owing to the time scale for submission to BSI and ISO, comments need to be submitted by Friday 18th June 2010.

If you have any comments, feel free to post them in the comment section of this blog entry, or email them to me. I will forward all appropriate suggestions to the rest of the BSI panel (whether or not I agree with them).

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

just::thread C++0x Thread Library V1.3.2 Released

Thursday, 25 March 2010

I am pleased to announce that version 1.3.2 of just::thread, our C++0x Thread Library has just been released.

This release is the first to feature support for the Microsoft Visual Studio 2010 RC for both 32-bit and 64-bit Windows.

There are also a few minor fixes to the future classes, and a new implementation of mutexes and condition variables on linux with lower overhead.

Purchase your copy and get started with the C++0x thread library NOW.

As usual, existing customers are entitled to a free upgrade to V1.3.2 from all earlier versions.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Older entries