Blog Archive
Implementing Dekker's algorithm with Fences
Tuesday, 27 July 2010
Dekker's algorithm is one of the most basic algorithms for mutual exclusion, alongside Peterson's algorithm and Lamport's bakery algorithm. It has the nice property that it only requires load and store operations rather than exchange or test-and-set, but it still requires some level of ordering between the operations. On a weakly-ordered architecture such as PowerPC or SPARC, a correct implementation of Dekker's algorithm thus requires the use of fences or memory barriers in order to ensure correct operation.
The code
For those of you who just want the code: here it is — Dekker's algorithm in C++, with explicit fences.
std::atomic<bool> flag0(false),flag1(false);
std::atomic<int> turn(0);
void p0()
{
flag0.store(true,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
while (flag1.load(std::memory_order_relaxed))
{
if (turn.load(std::memory_order_relaxed) != 0)
{
flag0.store(false,std::memory_order_relaxed);
while (turn.load(std::memory_order_relaxed) != 0)
{
}
flag0.store(true,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
}
std::atomic_thread_fence(std::memory_order_acquire);
// critical section
turn.store(1,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
flag0.store(false,std::memory_order_relaxed);
}
void p1()
{
flag1.store(true,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
while (flag0.load(std::memory_order_relaxed))
{
if (turn.load(std::memory_order_relaxed) != 1)
{
flag1.store(false,std::memory_order_relaxed);
while (turn.load(std::memory_order_relaxed) != 1)
{
}
flag1.store(true,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
}
std::atomic_thread_fence(std::memory_order_acquire);
// critical section
turn.store(0,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
flag1.store(false,std::memory_order_relaxed);
}
The analysis
If you're like me then you'll be interested in why stuff works, rather than just taking the code. Here is my analysis of the required orderings, and how the fences guarantee those orderings.
Suppose thread 0 and thread 1 enter p0
and p1 respectively at the same time. They both set their
respective flags to true, execute the fence and then read
the other flag at the start of the while loop.
If both threads read false then both will enter the
critical section, and the algorithm doesn't work. It is the job of the
fences to ensure that this doesn't happen.
The fences are marked with memory_order_seq_cst, so either the
fence in p0 is before the fence in p1 in the global ordering of
memory_order_seq_cst operations, or vice-versa. Without
loss of generality, we can assume that the fence in p0
comes before the fence in p1, since the code is
symmetric. The store to flag0 is sequenced before the
fence in p0, and the fence in p1 is
sequenced before the read from flag0. Therefore the
read from flag0 must see the value stored
(true), so p1 will enter
the while loop.
On the other side, there is no such guarantee for the read
from flag1 in p0, so p0 may or
may not enter the while loop. If p0 reads
the value of false for flag1, it will not
enter the while loop, and will instead enter the critical
section, but that is OK since p1 has entered
the while loop.
Though flag0 is not set to false
until p0 has finished the critical section, we need to
ensure that p1 does not see this until the values
modified in the critical section are also visible to p1,
and this is the purpose of the release fence prior to the store
to flag0 and the acquire fence after
the while loop. When p1 reads the
value false from
flag0 in order to exit the while loop, it
must be reading the value store by p0 after the release
fence at the end of the critical section. The acquire fence after the
load guarantees that all the values written before the release fence
prior to the store are visible, which is exactly what we need
here.
If p0 reads true for flag1,
it will enter the while loop rather than the critical
section. Both threads are now looping, so we need a way to ensure
that exactly one of them breaks out. This is the purpose of
the turn variable. Initially, turn is 0,
so p1 will enter the if and
set flag1 to false, whilst p1
will not enter the if. Because p1 set
flag1 to false, eventually p0
will read flag1 as false and exit the
outer while loop to enter the critical section. On the
other hand, p1 is now stuck in the
inner while loop because turn is
0. When p0 exits the critical section it
sets turn to 1. This will eventually be seen
by p1, allowing it to exit the inner while
loop. When the store to
flag0 becomes visible p1 can then exit the
outer while loop too.
If turn had been 1 initially (because p0
was the last thread to enter the critical section) then the inverse
logic would apply, and p0 would enter the inner loop,
allowing p1 to enter the critical section first.
Second time around
If p0 is called a second time whilst p1
is still in the inner loop then we have a similar situation to the
start of the function — p1 may exit the inner
loop and store true in flag1
whilst p0 stores true
in flag0. We therefore need the
second memory_order_seq_cst fence after the store to
the flag in the inner loop. This guarantees that
at least one of the threads will see the flag from the other thread
set when it executes the check in the outer loop. Without this fence
then both threads can read false, and both can enter
the critical section.
Alternatives
You could put the ordering constraints on the loads and stores
themselves rather than using fences. Indeed, the default memory
ordering for atomic operations in C++
is memory_order_seq_cst, so the algorithm would "just
work" with plain loads and stores to atomic variables. However, by
using memory_order_relaxed on the loads and stores we
can add fences to ensure we have exactly the ordering constraints
required.
Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, synchronization, Dekker, fences, cplusplus
Stumble It!
| Submit to Reddit
| Submit to DZone ![]()
If you liked this post, why not subscribe to the RSS feed
or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.
Reference Wrappers Explained
Wednesday, 14 July 2010
The
upcoming C++0x
standard includes reference wrappers in the form of
the std::reference_wrapper<T> class template, and
the helper function templates std::ref()
and std::cref(). As I mentioned in my blog post
on Starting
Threads with Member Functions and Reference Arguments, these
wrappers can be used to pass references to objects across interfaces
that normally require copyable (or at least movable) objects
— in that blog post, std::ref was used for passing
references to objects over to the new thread, rather than copying the
objects. I was recently asked what the difference was
between std::ref and std::cref, and how they
worked, so I thought I'd elaborate.
Deducing the Referenced Type
std::ref is a function template, so automatically
deduces the type of the wrapped reference from the type of the
supplied argument. This type deduction includes
the const-ness of the supplied object:
int x=3; const int y=4; std::reference_wrapper<int> rx=std::ref(x); // std::reference_wrapper<int> ry=std::ref(y); // error std::reference_wrapper<const int> rcy=std::ref(y);
On the other hand, though std::cref also deduces the
type of the wrapped reference from the supplied argument,
it always wraps a const reference:
int x=3; const int y=4; // std::reference_wrapper<int> rx=std::cref(x); // error std::reference_wrapper<const int> rcx=std::cref(x); // std::reference_wrapper<int> ry=std::cref(y); // error std::reference_wrapper<const int> rcy=std::cref(y);
Since a no-const-reference can always be bound to
a const reference, you can thus
use std::ref in pretty much every case where you would
use std::cref, and your code would work the same. Which
begs the question: why would you ever choose to
use std::cref?
Using std::cref to prevent modification
The primary reason for choosing std::cref is because
you want to guarantee that the source object is not modified through
that reference. This can be important when writing multithreaded
code — if a thread should not be modifying some data then it
can be worth enforcing this by passing a const
reference rather than a mutable reference.
void foo(int&); // mutable reference int x=42; // Should not be modified by thread std::thread t(foo,std::cref(x)); // will fail to compile
This can be important where there are overloads of a function such
that one takes a const reference, and the other a
non-const reference: if we don't want the object
modified then it is important that the overload taking
a const reference is chosen.
struct Foo
{
void operator()(int&) const;
void operator()(int const&) const;
};
int x=42;
std::thread(Foo(),std::cref(x)); // force const int& overload
References to temporaries
std::cref has another property missing
from std::ref — it can bind to temporaries, since
temporary objects can bind to const references. I'm
not sure this is a good thing, especially when dealing with multiple
threads, as the referenced temporary is likely to have been
destroyed before the thread has even started. This is therefore
something to watch out for:
void bar(int const&); std::thread t(bar,std::cref(42)); // oops, ref to temporary
Documentation
Finally, std::cref serves a documentation purpose,
even where std::ref would suffice — it declares
in big bold letters that this reference cannot be used to modify the
referenced object, which thus makes it easier to reason about the
code.
Recommendation
I would recommend that you use std::cref in preference
to std::ref whenever you can — the benefits as
documentation of intent, and avoiding accidental modification
through the reference make it a clear winner in my opinion. Of
course, if you do want to modify the referenced
object, then you need to use std::ref, but such usage
now stands out, and makes it clear that this is the intent.
You do still need to be careful to ensure that you don't try and
wrap references to temporaries, particularly when
applying std::cref to the result of a function call,
but such uses should stand out — I expect most uses to be
wrapping a reference to a named variable rather than wrapping a
function result.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: reference wrappers, ref, cref, cplusplus
Stumble It!
| Submit to Reddit
| Submit to DZone ![]()
If you liked this post, why not subscribe to the RSS feed
or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.
Last day for comments on the C++0x FCD
Thursday, 17 June 2010
The BSI deadline for comments on the C++0x FCD is tomorrow, Friday 18th June 2010. The ISO deadline is 26th July 2010, but we have to write up comments for submission in the form required for ISO, which takes time.
If you have a comment on the FCD, please see my earlier blog post for how to submit it to BSI. Help us make the C++0x standard as good as it can be.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++, C++0x, WG21, FCD
Stumble It!
| Submit to Reddit
| Submit to DZone ![]()
If you liked this post, why not subscribe to the RSS feed
or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.
Enforcing Correct Mutex Usage with Synchronized Values
Friday, 28 May 2010
My latest article, Enforcing Correct Mutex Usage with Synchronized Values has been published on the Dr Dobb's website.
This article expands on the SynchronizedValue<T>
template I mentioned in my presentation
on Concurrency
in the Real World at ACCU 2010, and deals with the problem of
ensuring that the mutex associated with some data is locked whenever
the data is accessed.
The basic idea is that you
use SynchronizedValue<T> wherever you have
an object of type T that you wish to be protected with
its own mutex. The SynchronizedValue<T> then
behaves like a pointer-to-T for simple uses.
Read the article for the full details.
Posted by Anthony Williams
[/ news /] permanent link
Tags: mutex, cplusplus
Stumble It!
| Submit to Reddit
| Submit to DZone ![]()
If you liked this post, why not subscribe to the RSS feed
or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.
just::thread C++0x Thread Library V1.4 (FCD Edition) Released
Thursday, 06 May 2010
I am pleased to announce that version 1.4 (the FCD edition)
of just::thread,
our C++0x Thread Library
has just been released.
With the release of the "FCD edition", just::thread
provides the first complete implementation of the multithreading
facilities from
the Final
Committee Draft (FCD) of the C++0x standard.
Changes include:
- New
promise::set_value_at_thread_exit,promise::set_exception_at_thread_exit, andpackaged_task::make_ready_at_thread_exitmember functions to defer unblocking waiting threads until the notifying thread exits -
New
notify_all_at_thread_exit functionfor notifying condition variables when the notifying thread exits -
The
wait_forandwait_untilmember functions offuture,shared_futureandatomic_futurereturn afuture_statusenum rather thanboolto indicate whether the future is ready, the wait timed out, or the future contains a deferred async function - The destructor of the last future associated with an async function waits for that function to complete.
- New
ATOMIC_VAR_INITmacro for initializing atomic objects - The callable object for a
packaged_taskis destroyed with thepackaged_taskrather than being kept alive until the future is destroyed
As usual, existing customers are entitled to a free upgrade to V1.4.0 from all earlier versions.
Posted by Anthony Williams
[/ news /] permanent link
Tags: multithreading, concurrency, C++0x
Stumble It!
| Submit to Reddit
| Submit to DZone ![]()
If you liked this post, why not subscribe to the RSS feed
or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.
"Concurrency in the Real World" slides now available
Monday, 19 April 2010
The slides for my presentation on "Concurrency in the Real World" at the ACCU 2010 conference last week are now available.
The room was full, and quite warm due to the air conditioning having been turned off, but everything went to plan, and there were some insightful questions from the audience. I've thoroughly enjoyed presenting at ACCU in previous years, and this was no exception.
I covered the main pitfalls people encounter when writing multithreaded code, along with some techniques that I've found help deal with those problems, including some example code from projects I've worked on. As you might expect, all my examples were in C++, though the basic ideas are cross-language. I finished up by talking about what we might hope to get out of multithreaded code, such as performance, additional features and responsiveness.
There's a discount on
my just::thread
library until Friday 23rd April 2010, so if you're doing concurrency
in C++ with Microsoft Visual Studio on Windows or g++ on linux get
yourself a copy whilst it's on offer and start taking advantage of
the new
C++0x thread
library.
Posted by Anthony Williams
[/ news /] permanent link
Tags: concurrency, multithreading, c++, ACCU
Stumble It!
| Submit to Reddit
| Submit to DZone ![]()
If you liked this post, why not subscribe to the RSS feed
or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.
March 2010 C++ Standards Committee Mailing
Thursday, 08 April 2010
The March 2010 mailing for the C++ Standards Committee was published last week. This is the post-meeting mailing for the March 2010 committee meeting, and contains the C++0x Final Committee Draft, which I blogged about last week.
There are 6 concurrency-related papers (of which my name is on two), which I summarize below:
Concurrency-related papers
- N3057: Explicit Initializers for Atomics
This paper proposes new initializers for atomic variables, providing a means of writing code which can be compiled as either C or C++. e.g.
void foo() { atomic_int a=ATOMIC_VAR_INIT(42); // initialize a with 42 atomic_uint b; // uninitialized atomic_init(&b,123); // b now initialized to 123 }- N3058: Futures and Async Cleanup (Rev.)
This is a revision of N3041 to resolve many of the outstanding issues with futures and async. Mostly it's just wordsmithing to tidy up the specification, but there's a few key changes:
- Defined behaviour for
the
wait_for()andwait_until()member functions ofstd::future,std::shared_futureandstd::atomic_futurewhen used withstd::asyncand a launch policy ofstd::launch::sync. The return value is now a value of the newstd::future_statusenumeration, and can bestd::future_status::readyif the future becomes ready before the timeout,std::future_status::timeoutif the wait times out, orstd::future_status::deferredif the future comes from a call tostd::asyncwith a launch policy ofstd::launch::syncand the function associated with the future hasn't yet started execution on any thread. - The wording
for
std::asyncadopts the same wording asstd::threadto clarify the copy/move and perfect forwarding semantics of the call.
- Defined behaviour for
the
- N3069: Various threads issues in the library (LWG 1151)
This is a revision of N3040, and highlights which operations through iterators constitute accesses and data races, and explicitly allows for synchronization by writing and reading to/from a stream.
- N3070:
Handling Detached Threads and
thread_localVariables This is a hugely simplified replacement for my previous paper N3038. Rather than creating contexts for
thread_localvariables, this paper proposes new member functions forstd::promiseandstd::packaged_taskto allow the value to be set at the point of call, but threads waiting on associated futures to be woken only afterthread_localvariables have been destroyed at thread exit. This means that you can now safely wait on a future which is set in such a fashion when waiting for a task running on a background thread to complete, without having to join with the thread or worry about races arising from the destructors ofthread_localvariables. The paper also adds a similar mechanism for condition variables as a non-member function.- N3071:
Renaming
launch::anyand what asyncs really might be (Rev.) This is a revision of N3042 proposing renaming
std::launch::anytostd::launch::sync_or_async. This paper was not approved.- N3074: Updates to C++ Memory Model Based on Formalization
This is a revision of N3045. This paper proposes some changes to the wording of the memory model in order to ensure that it means what we intended it to mean.
Other Papers
There's several non-concurrency papers in the mailing as well as the standard set (working draft, agenda, issues lists, etc.). The most significant of these in my view are the following 3 papers. Check the mailing for the full set.
- N3050: Allowing Move Constructors to Throw (Rev. 1)
This paper adds the new
noexceptkeyword to C++. This is used in place of an exception specification. On its own it means that the function does not throw any exceptions, but it can also be used with a boolean constant expression wheretruemeans that the function doesn't throw, andfalsemeans that it might. e.g.void foo() noexcept; // will not throw void bar() noexcept(true); // will not throw void baz() noexcept(false); // may throw
If a
noexceptexception specification is violated thenstd::terminate()is called.The primary benefit from the boolean-constant-expression version is in templates, where the boolean expression can check various properties of the template parameter types. One of the things you can check is whether or not particular operations throw, e.g. by using the new
has_nothrow_move_constructortype trait to declare the move constructor for a class to benoexceptif its class members have non-throwing move constructors:template<typename T> class X { T data; public: X(X&& other) noexcept(std::has_nothrow_move_constructor<T>::value): data(std::move(other.data)) {} };- N3053: Defining Move Special Member Functions
This proposal ensures that user-defined classes have move constructors and move assignment operators generated for them by the compiler if that is safe. Explicitly declaring a copy or move constructor will prevent the implicit declaration of the other, and likewise for copy and move assignment. You can always request the default definition using the
= defaultsyntax.This means that lots of user code will now be able to readily take advantage of move semantics with a simple code change or even just a recompile. This can potentially be of major performance benefit.
- N3055: A Taxonomy of Expression Value Categories
This paper nails down the true distinctions between lvalues, rvalues and rvalue references. It provides a new set of names to identify the distinct categories of values in C++ — lvalues and rvalues we already have, but now there's xvalues, prvalues and glvalues too. This categorization allows for better specification of when things can bind to lvalue references or rvalue references, when the compiler can eliminate copies or moves.
Please comment on the FCD
The purpose of the C++0x Final Committee Draft is to get comments prior to publication to ensure the final C++0x standard is as defect free as possible. This opportunity is only available for a limited time, so please comment on the FCD.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++0x, C++, standards, concurrency
Stumble It!
| Submit to Reddit
| Submit to DZone ![]()
If you liked this post, why not subscribe to the RSS feed
or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.
Sign up for a 50% discount just::thread FCD edition
Wednesday, 07 April 2010
I'm in the process of updating our C++0x thread library for VS2008, VC10, g++ 4.3 and g++ 4.4 to incorporate the changes to the C++0x thread library voted into the C++0x FCD. I'll be writing a blog post with more details in due course, but the big changes are:
- Functions for postponing notification of threads waiting on a
std::futureuntil the thread that set the value on thestd::promiseor ran thestd::packaged_taskhas exited. - A similar facility for notifying
a
std::condition_variableat thread exit. - Defined behaviour for
the
wait_for()andwait_until()member functions ofstd::futurewhen used withstd::asyncand a launch policy ofstd::launch::sync. - Changes to the initialization of atomic variables.
Existing customers will get the new version as a free upgrade, but
the rest of you can get a 50% discount if you subscribe to
my blog by email. Just fill in your name and email address
in the form below and be sure to click the confirmation link. You'll
then receive future blog posts by email, along with an announcement
and exclusive discount for the FCD edition
of just::thread
when it's released.
If you're reading this via RSS and your reader doesn't show you the form or doesn't allow you to submit your details, then please go to the web version of this blog entry.
If you've already subscribed by email then you don't need to subscribe again, you'll automatically receive the discount code.
Posted by Anthony Williams
[/ news /] permanent link
Tags: concurrency, threading, C++0x, just::thread
Stumble It!
| Submit to Reddit
| Submit to DZone ![]()
If you liked this post, why not subscribe to the RSS feed
or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.
C++0x Final Committee Draft Published - Please Comment
Friday, 02 April 2010
Earlier this week, the Final Committee Draft (FCD) of the C++0x standard was published. This means that C++0x is now in the final stages of bug fixing and wordsmithing before publication. If all goes to plan, the draft will move to Final Draft International Standard (FDIS) early in 2011, and will be a new standard by the end of 2011.
The publication of the FCD means that the draft standard has now been officially put up for review by the national standards bodies of ISO's member countries. The British Standards Institution is one of several national bodies that is actively involved in the standardisation of the C++ language. The panel members of the C++ Committee of the BSI, IST 5/-/21, are currently compiling a list of comments on the FCD. We intend to submit these as the BSI's National Body comments, aimed at getting issues with the FCD addressed before it becomes the new international standard for C++.
We're welcoming additional comments, and would like to provide a channel for anyone who may be interested in the C++0x Standard, but not able to be fully involved in the standards process, to submit comments. Note that not all comments — regardless of whether they are submitted by panel members or non-members — will go forward.
Here is some guidance on what we are looking for:
- Suggestions for how to improve the clarity of the wording, even if that's just by adding a cross-reference to a relevant paragraph elsewhere;
- Comments that identify any under/over specification; and
- Comments highlighting inconsistencies or contradictions in the draft text.
Comments should be specific and preferably should include suggested updated wording (and if you need help formulating updated wording we can provide it, within reason) — the C++ standards committee is working to a very tight schedule in order to get C++0x out as soon as possible, and comments without wording (which therefore require more work from the committee) are more likely to be rejected.
The time for adding/removing features has now passed, so comments should focus on improving the draft as it stands rather than suggesting new features.
Owing to the time scale for submission to BSI and ISO, comments need to be submitted by Friday 18th June 2010.
If you have any comments, feel free to post them in the comment section of this blog entry, or email them to me. I will forward all appropriate suggestions to the rest of the BSI panel (whether or not I agree with them).
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++, C++0x, WG21, FCD
Stumble It!
| Submit to Reddit
| Submit to DZone ![]()
If you liked this post, why not subscribe to the RSS feed
or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.
just::thread C++0x Thread Library V1.3.2 Released
Thursday, 25 March 2010
I am pleased to announce that version 1.3.2 of just::thread, our C++0x Thread Library has just been released.
This release is the first to feature support for the Microsoft Visual Studio 2010 RC for both 32-bit and 64-bit Windows.
There are also a few minor fixes to the future classes, and a new implementation of mutexes and condition variables on linux with lower overhead.
Purchase your copy and get started with the C++0x thread library NOW.As usual, existing customers are entitled to a free upgrade to V1.3.2 from all earlier versions.
Posted by Anthony Williams
[/ news /] permanent link
Tags: multithreading, concurrency, C++0x
Stumble It!
| Submit to Reddit
| Submit to DZone ![]()
If you liked this post, why not subscribe to the RSS feed
or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.