Blog Archive for / 2010 / 07 /

Implementing Dekker's algorithm with Fences

Tuesday, 27 July 2010

Dekker's algorithm is one of the most basic algorithms for mutual exclusion, alongside Peterson's algorithm and Lamport's bakery algorithm. It has the nice property that it only requires load and store operations rather than exchange or test-and-set, but it still requires some level of ordering between the operations. On a weakly-ordered architecture such as PowerPC or SPARC, a correct implementation of Dekker's algorithm thus requires the use of fences or memory barriers in order to ensure correct operation.

The code

For those of you who just want the code: here it is — Dekker's algorithm in C++, with explicit fences.

std::atomic<bool> flag0(false),flag1(false);
std::atomic<int> turn(0);

void p0()
{
    flag0.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag1.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 0)
        {
            flag0.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 0)
            {
            }
            flag0.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);
 
    // critical section


    turn.store(1,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag0.store(false,std::memory_order_relaxed);
}

void p1()
{
    flag1.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag0.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 1)
        {
            flag1.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 1)
            {
            }
            flag1.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);
 
    // critical section


    turn.store(0,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag1.store(false,std::memory_order_relaxed);
}

The analysis

If you're like me then you'll be interested in why stuff works, rather than just taking the code. Here is my analysis of the required orderings, and how the fences guarantee those orderings.

Suppose thread 0 and thread 1 enter p0 and p1 respectively at the same time. They both set their respective flags to true, execute the fence and then read the other flag at the start of the while loop.

If both threads read false then both will enter the critical section, and the algorithm doesn't work. It is the job of the fences to ensure that this doesn't happen.

The fences are marked with memory_order_seq_cst, so either the fence in p0 is before the fence in p1 in the global ordering of memory_order_seq_cst operations, or vice-versa. Without loss of generality, we can assume that the fence in p0 comes before the fence in p1, since the code is symmetric. The store to flag0 is sequenced before the fence in p0, and the fence in p1 is sequenced before the read from flag0. Therefore the read from flag0 must see the value stored (true), so p1 will enter the while loop.

On the other side, there is no such guarantee for the read from flag1 in p0, so p0 may or may not enter the while loop. If p0 reads the value of false for flag1, it will not enter the while loop, and will instead enter the critical section, but that is OK since p1 has entered the while loop.

Though flag0 is not set to false until p0 has finished the critical section, we need to ensure that p1 does not see this until the values modified in the critical section are also visible to p1, and this is the purpose of the release fence prior to the store to flag0 and the acquire fence after the while loop. When p1 reads the value false from flag0 in order to exit the while loop, it must be reading the value store by p0 after the release fence at the end of the critical section. The acquire fence after the load guarantees that all the values written before the release fence prior to the store are visible, which is exactly what we need here.

If p0 reads true for flag1, it will enter the while loop rather than the critical section. Both threads are now looping, so we need a way to ensure that exactly one of them breaks out. This is the purpose of the turn variable. Initially, turn is 0, so p1 will enter the if and set flag1 to false, whilst p1 will not enter the if. Because p1 set flag1 to false, eventually p0 will read flag1 as false and exit the outer while loop to enter the critical section. On the other hand, p1 is now stuck in the inner while loop because turn is 0. When p0 exits the critical section it sets turn to 1. This will eventually be seen by p1, allowing it to exit the inner while loop. When the store to flag0 becomes visible p1 can then exit the outer while loop too.

If turn had been 1 initially (because p0 was the last thread to enter the critical section) then the inverse logic would apply, and p0 would enter the inner loop, allowing p1 to enter the critical section first.

Second time around

If p0 is called a second time whilst p1 is still in the inner loop then we have a similar situation to the start of the function — p1 may exit the inner loop and store true in flag1 whilst p0 stores true in flag0. We therefore need the second memory_order_seq_cst fence after the store to the flag in the inner loop. This guarantees that at least one of the threads will see the flag from the other thread set when it executes the check in the outer loop. Without this fence then both threads can read false, and both can enter the critical section.

Alternatives

You could put the ordering constraints on the loads and stores themselves rather than using fences. Indeed, the default memory ordering for atomic operations in C++ is memory_order_seq_cst, so the algorithm would "just work" with plain loads and stores to atomic variables. However, by using memory_order_relaxed on the loads and stores we can add fences to ensure we have exactly the ordering constraints required.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, synchronization, Dekker, fences, cplusplus
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Reference Wrappers Explained

Wednesday, 14 July 2010

The upcoming C++0x standard includes reference wrappers in the form of the std::reference_wrapper<T> class template, and the helper function templates std::ref() and std::cref(). As I mentioned in my blog post on Starting Threads with Member Functions and Reference Arguments, these wrappers can be used to pass references to objects across interfaces that normally require copyable (or at least movable) objects — in that blog post, std::ref was used for passing references to objects over to the new thread, rather than copying the objects. I was recently asked what the difference was between std::ref and std::cref, and how they worked, so I thought I'd elaborate.

Deducing the Referenced Type

std::ref is a function template, so automatically deduces the type of the wrapped reference from the type of the supplied argument. This type deduction includes the const-ness of the supplied object:

int x=3;
const int y=4;
std::reference_wrapper<int> rx=std::ref(x);
// std::reference_wrapper<int> ry=std::ref(y); // error
std::reference_wrapper<const int> rcy=std::ref(y);

On the other hand, though std::cref also deduces the type of the wrapped reference from the supplied argument, it always wraps a const reference:

int x=3;
const int y=4;
// std::reference_wrapper<int> rx=std::cref(x); // error
std::reference_wrapper<const int> rcx=std::cref(x);
// std::reference_wrapper<int> ry=std::cref(y); // error
std::reference_wrapper<const int> rcy=std::cref(y);

Since a no-const-reference can always be bound to a const reference, you can thus use std::ref in pretty much every case where you would use std::cref, and your code would work the same. Which begs the question: why would you ever choose to use std::cref?

Using `std::cref` to prevent modification

The primary reason for choosing std::cref is because you want to guarantee that the source object is not modified through that reference. This can be important when writing multithreaded code — if a thread should not be modifying some data then it can be worth enforcing this by passing a const reference rather than a mutable reference.

void foo(int&); // mutable reference

int x=42; // Should not be modified by thread
std::thread t(foo,std::cref(x)); // will fail to compile

This can be important where there are overloads of a function such that one takes a const reference, and the other a non-const reference: if we don't want the object modified then it is important that the overload taking a const reference is chosen.

struct Foo
{
    void operator()(int&) const;
    void operator()(int const&) const;
};

int x=42;
std::thread(Foo(),std::cref(x)); // force const int& overload

References to temporaries

std::cref has another property missing from std::ref — it can bind to temporaries, since temporary objects can bind to const references. I'm not sure this is a good thing, especially when dealing with multiple threads, as the referenced temporary is likely to have been destroyed before the thread has even started. This is therefore something to watch out for:

void bar(int const&);

std::thread t(bar,std::cref(42)); // oops, ref to temporary

Documentation

Finally, std::cref serves a documentation purpose, even where std::ref would suffice — it declares in big bold letters that this reference cannot be used to modify the referenced object, which thus makes it easier to reason about the code.

Recommendation

I would recommend that you use std::cref in preference to std::ref whenever you can — the benefits as documentation of intent, and avoiding accidental modification through the reference make it a clear winner in my opinion. Of course, if you do want to modify the referenced object, then you need to use std::ref, but such usage now stands out, and makes it clear that this is the intent.

You do still need to be careful to ensure that you don't try and wrap references to temporaries, particularly when applying std::cref to the result of a function call, but such uses should stand out — I expect most uses to be wrapping a reference to a named variable rather than wrapping a function result.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: reference wrappers, ref, cref, cplusplus
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Previous Entries Later Entries

Just Software Solutions

About Us

Technical Writings

Subscribe to Blog

Blog Archives

Blog Archive for / 2010 / 07 /

Implementing Dekker's algorithm with Fences

Tuesday, 27 July 2010

The code

The analysis

Second time around

Alternatives

Reference Wrappers Explained

Wednesday, 14 July 2010

Deducing the Referenced Type

Using `std::cref` to prevent modification

References to temporaries

Documentation

Recommendation

Just Software Solutions

About Us

Technical Writings

Subscribe to Blog

Blog Archives

Blog Archive for / 2010 / 07 /

Implementing Dekker's algorithm with Fences

Tuesday, 27 July 2010

The code

The analysis

Second time around

Alternatives

Reference Wrappers Explained

Wednesday, 14 July 2010

Deducing the Referenced Type

Using std::cref to prevent modification

References to temporaries

Documentation

Recommendation

Using `std::cref` to prevent modification