Multithreading and Concurrency

My book, C++ Concurrency in Action contains a detailed description of the C++11 threading facilities, and techniques for designing concurrent code.

The just::thread implementation of the new C++0x thread library is available for Microsoft Visual Studio 2005, Microsoft Visual Studio 2008, Microsoft Visual Studio 2010, g++ 4.5.2 and g++ 4.6.1 on Windows, g++ 4.3, 4.4, 4.5 and 4.6 on Linux, and g++ 4.3, 4.4 and 4.5 on MacOSX. Order your copy today.

Locks, Mutexes, and Semaphores: Types of Synchronization Objects

Tuesday, 21 October 2014

I recently got an email asking about locks and different types of synchronization objects, so I'm posting this entry in case it is of use to others.

Locks

A lock is an abstract concept. The basic premise is that a lock protects access to some kind of shared resource. If you own a lock then you can access the protected shared resource. If you do not own the lock then you cannot access the shared resource.

To own a lock, you first need some kind of lockable object. You then acquire the lock from that object. The precise terminology may vary. For example, if you have a lockable object XYZ you may:

  • acquire the lock on XYZ,
  • take the lock on XYZ,
  • lock XYZ,
  • take ownership of XYZ,
  • or some similar term specific to the type of XYZ

The concept of a lock also implies some kind of exclusion: sometimes you might be unable to take ownership of a lock, and the operation to do so will either fail, or block. In the former case, the operation will return some error code or exception to indicate that the attempt to take ownership failed. In the latter case, the operation will not return until it has taken ownership, which typically requires that another thread in the system does something to permit that to happen.

The most common form of exclusion is a simple numerical count: the lockable object has a maximum number of owners. If that number has been reached, then any further attempt to acquire a lock on it will be unable to succeed. This therefore requires that we have some mechanism of relinquishing ownership when we are done. This is commonly called unlocking, but again the terminology may vary. For example, you may:

  • release the lock on XYZ,
  • drop the lock on XYZ,
  • unlock XYZ,
  • relinquish ownership of XYZ,
  • or some similar term specific to the type of XYZ

When you relinquish ownership in the appropriate fashion then a blocked operation that is trying to acquire the lock may not proceed, if the required conditions have been met.

For example if a lockable object only allows 3 owners then a 4th attempt to acquire the lock will block. When one of the first 3 owners releases the lock then that 4th attempt to acquire the lock will succeed.

Ownership

What it means to "own" a lock depends on the precise type of the lockable object. For some lockable objects there is a very tight definition of ownership: this specific thread owns the lock, through the use of that specific object, within this particular scope.

In other cases, the definition is more fluid, and the ownership of the lock is more conceptual. In these cases, ownership can be relinquished by a different thread or object than the thread or object that acquired the lock.

Mutexes

Mutex is short for MUTual EXclusion. Unless the word is qualified with additional terms such as shared mutex, recursive mutex or read/write mutex then it refers to a type of lockable object that can be owned by exactly one thread at a time. Only the thread that acquired the lock can release the lock on a mutex. When the mutex is locked, any attempt to acquire the lock will fail or block, even if that attempt is done by the same thread.

Recursive Mutexes

A recursive mutex is similar to a plain mutex, but one thread may own multiple locks on it at the same time. If a lock on a recursive mutex has been acquired by thread A, then thread A can acquire further locks on the recursive mutex without releasing the locks already held. However, thread B cannot acquire any locks on the recursive mutex until all the locks held by thread A have been released.

In most cases, a recursive mutex is undesirable, since the it makes it harder to reason correctly about the code. With a plain mutex, if you ensure that the invariants on the protected resource are valid before you release ownership then you know that when you acquire ownership those invariants will be valid.

With a recursive mutex this is not the case, since being able to acquire the lock does not mean that the lock was not already held, by the current thread, and therefore does not imply that the invariants are valid.

Reader/Writer Mutexes

Sometimes called shared mutexes, multiple-reader/single-writer mutexes or just read/write mutexes, these offer two distinct types of ownership:

  • shared ownership, also called read ownership, or a read lock, and
  • exclusive ownership, also called write ownership, or a write lock.

Exclusive ownership works just like ownership of a plain mutex: only one thread may hold an exclusive lock on the mutex, only that thread can release the lock. No other thread may hold any type of lock on the mutex whilst that thread holds its lock.

Shared ownership is more lax. Any number of threads may take shared ownership of a mutex at the same time. No thread may take an exclusive lock on the mutex while any thread holds a shared lock.

These mutexes are typically used for protecting shared data that is seldom updated, but cannot be safely updated if any thread is reading it. The reading threads thus take shared ownership while they are reading the data. When the data needs to be modified, the modifying thread first takes exclusive ownership of the mutex, thus ensuring that no other thread is reading it, then releases the exclusive lock after the modification has been done.

Spinlocks

A spinlock is a special type of mutex that does not use OS synchronization functions when a lock operation has to wait. Instead, it just keeps trying to update the mutex data structure to take the lock in a loop.

If the lock is not held very often, and/or is only held for very short periods, then this can be more efficient than calling heavyweight thread synchronization functions. However, if the processor has to loop too many times then it is just wasting time doing nothing, and the system would do better if the OS scheduled another thread with active work to do instead of the thread failing to acquire the spinlock.

Semaphores

A semaphore is a very relaxed type of lockable object. A given semaphore has a predefined maximum count, and a current count. You take ownership of a semaphore with a wait operation, also referred to as decrementing the semaphore, or even just abstractly called P. You release ownership with a signal operation, also referred to as incrementing the semaphore, a post operation, or abstractly called V. The single-letter operation names are from Dijkstra's original paper on semaphores.

Every time you wait on a semaphore, you decrease the current count. If the count was greater than zero then the decrement just happens, and the wait call returns. If the count was already zero then it cannot be decremented, so the wait call will block until another thread increases the count by signalling the semaphore.

Every time you signal a semaphore, you increase the current count. If the count was zero before you called signal, and there was a thread blocked in wait then that thread will be woken. If multiple threads were waiting, only one will be woken. If the count was already at its maximum value then the signal is typically ignored, although some semaphores may report an error.

Whereas mutex ownership is tied very tightly to a thread, and only the thread that acquired the lock on a mutex can release it, semaphore ownership is far more relaxed and ephemeral. Any thread can signal a semaphore, at any time, whether or not that thread has previously waited for the semaphore.

An analogy

A semaphore is like a public lending library with no late fees. They might have 5 copies of C++ Concurrency in Action available to borrow. The first five people that come to the library looking for a copy will get one, but the sixth person will either have to wait, or go away and come back later.

The library doesn't care who returns the books, since there are no late fees, but when they do get a copy returned, then it will be given to one of the people waiting for it. If no-one is waiting, the book will go on the shelf until someone does want a copy.

Binary semaphores and Mutexes

A binary semaphore is a semaphore with a maximum count of 1. You can use a binary semaphore as a mutex by requiring that a thread only signals the semaphore (to unlock the mutex) if it was the thread that last successfully waited on it (when it locked the mutex). However, this is only a convention; the semaphore itself doesn't care, and won't complain if the "wrong" thread signals the semaphore.

Critical Sections

In synchronization terms, a critical section is that block of code during which a lock is owned. It starts at the point that the lock is acquired, and ends at the point that the lock is released.

Windows CRITICAL_SECTIONs

Windows programmers may well be familiar with CRITICAL_SECTION objects. A CRITICAL_SECTION is a specific type of mutex, not a use of the general term critical section.

Mutexes in C++

The C++14 standard has five mutex types:

The variants with "timed" in the name are the same as those without, except that the lock operations can have time-outs specified, to limit the maximum wait time. If no time-out is specified (or possible) then the lock operations will block until the lock can be acquired — potentially forever if the thread that holds the lock never releases it.

std::mutex and std::timed_mutex are just plain single-owner mutexes.

std::recursive_mutex and std::recursive_timed_mutex are recursive mutexes, so multiple locks may be held by a single thread.

std::shared_timed_mutex is a read/write mutex.

C++ lock objects

To go with the various mutex types, the C++ Standard defines a triplet of class templates for objects that hold a lock. These are:

For basic operations, they all acquire the lock in the constructor, and release it in the destructor, though they can be used in more complex ways if desired.

std::lock_guard<> is the simplest type, and just holds a lock across a critical section in a single block:

std::mutex m;
void f(){
    std::lock_guard<std::mutex> guard(m);
    // do stuff
}

std::unique_lock<> is similar, except it can be returned from a function without releasing the lock, and can have the lock released before the destructor:

std::mutex m;
std::unique_lock<std::mutex> f(){
    std::unique_lock<std::mutex> guard(m);
    // do stuff
    return std::move(guard);
}

void g(){
    std::unique_lock<std::mutex> guard(f());
    // do more stuff
    guard.unlock();
}

See my previous blog post for more about std::unique_lock<> and std::lock_guard<>.

std::shared_lock<> is almost identical to std::unique_lock<> except that it acquires a shared lock on the mutex. If you are using a std::shared_timed_mutex then you can use std::lock_guard<std::shared_timed_mutex> or std::unique_lock<std::shared_timed_mutex> for the exclusive lock, and std::shared_lock<std::shared_timed_mutex> for the shared lock.

std::shared_timed_mutex m;
void reader(){
    std::shared_lock<std::shared_timed_mutex> guard(m);
    // do read-only stuff
}
void writer(){
    std::lock_guard<std::shared_timed_mutex> guard(m);
    // update shared data
}

Semaphores in C++

The C++ standard does not define a semaphore type. You can write your own with an atomic counter, a mutex and a condition variable if you need, but most uses of semaphores are better replaced with mutexes and/or condition variables anyway.

Unfortunately, for those cases where semaphores really are what you want, using a mutex and a condition variable adds overhead, and there is nothing in the C++ standard to help. Olivier Giroux and Carter Edwards' proposal for a std::synchronic class template (N4195) might allow for an efficient implementation of a semaphore, but this is still just a proposal.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

C++ Concurrency in Action and Just::Thread Discounts

Wednesday, 25 April 2012

C++ Concurrency in Action cover

My book C++ Concurrency in Action was finally published on 29th February 2012, after 4 years of work. It's hard to believe that I can actually hold a copy in my hand; it's just been files on my computer for so long.

My book is a tutorial and reference to the thread library from the new C++11 standard. It also provides various guidelines for writing and testing multithreaded code, as well as sample implementations of thread-safe data structures and parallel algorithms

If you haven't already got a copy, you can order one direct from Manning, or from amazon.com, or amazon.co.uk. Alternatively, copies should be available at the ACCU 2012 conference in Oxford this week.

Discount on Just::Thread

If you have purchased the book then send a copy of your receipt or other proof of purchase to info@justsoftwaresolutions.co.uk for a 50% discount on a single user license of Just::Thread, our implementation of the C++11 thread library described in the book for Microsoft Visual Studio on Windows, and g++ on Windows, Linux and MacOSX.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Thread-Safe Copy and Move Constructors

Wednesday, 17 August 2011

This is a guest post by Michael Spertus. Michael is a Distinguished Engineer at Symantec. He is also a C++ Standards Committee member and teaches the graduate C++ sequence at the University of Chicago. He can be contacted at mike_spertus@symantec.com.


This guest column discusses writing thread-safe constructors. As we will see, this is more difficult than it seems. Fortunately, we will also see that C++11 offers a very pretty solution to this problem that nicely illustrates the synergy of the new features introduced in C++11.

The problem

If you have a class that supports locking of objects to serialize access to a given object, you probably want the class' copy constructor and move constructor (if it has one) to lock the source object to get a consistent snapshot of the source object so the destination object isn't messed up if the source changes in the middle of the copy or move.

This isn't nearly as easy as it sounds. In the following class, a mutex is used to try to enforce the invariant that i_squared should always be the square of i.

class A {
public:  A(_i = 0) { set(_i); }
  set(int _i) {
    std::lock_guard<std::mutex> lock(mtx);
    i = _i;
    i_squared = i*i;
  }
  ...
private:
  std::mutex mtx;
  int i;
  int i_squared;
};

Unfortunately, the default copy constructor doesn't acquire the mutex, so in code like the following, f can copy a "half set" version of a if another thread modifies a at the same time.

void f(A &a) 
{
  A a2 = a;
  ...
}

First attempt

A naive attempt is to acquire the lock in the constructor body just like in a thread-safe method.

class A {
public:
  A(const A &a) : i(a.i), i_squared(a.i_squared) {
    std::lock_guard<std::mutex> lock(a.mtx); // Too late!
  }
  ...
};

Unfortunately, this fares no better as i and i_squared are copied before we acquire the lock.

Second attempt

One approach would be to simply not lock in the copy constructor at all and just manually lock objects you want to copy:

void f(A &a)
{
  std::lock_guard<std::mutex>
  lock(a.mtx);
  A a2 = a;
  ...
}

This approach deserves careful consideration. For classes which are not usually shared between threads or which need locking granularity at a different level than their internal operations, managing locks within the class can be an antipattern. This concern was a primary reason why C++11 does not have an equivalent to the SynchronizedCollection wrapper found in Java and C#. For example, synchronized collections make it easy to inadvertently loop through a collection believing your code is thread-safe even though the collection could change between individual operations on the collection during the loop. Of course, if we decide not to have A's copy constructor lock, then A::set() should not lock either.

Still, it remains a very common and useful pattern for classes designed for shared use to have all their internal operations acquire the lock (i.e., monitors/synchronized classes). If A is a synchronized class that locks its methods internally, it would be very confusing and prone to intermittent errors to still have to manually acquire the lock whenever an object is copied or moved. Also, generic code, which doesn't know about A::mtx is unlikely to work properly.

Third attempt

One thing we can do is dispense with member initialization lists in constructors altogether

class A {
public:
  A(const A &a) {
    std::lock_guard<std::mutex> lock(a.mtx);
    i = a.i;
    i_squared = a.i_squared;
  }
  ...
};

This solution is awkward at best if any bases or members don't have default constructors, have reference type, or are const. It also seems unfair to have to pay an efficiency penalty (for constructing and assigning separately) just because there is no place to put the lock. In practice, I also suspect intermittent errors will creep into large code bases as programmers carelessly add a base or member initializer to the constructor. Finally, it just isn't very satisfying to have to just discard core parts of constructor syntax just because your class is synchronized.

Fourth attempt

Anthony Williams has suggested implementing serialized classes using a wrapper class like:

struct PrivateBaseForA
{
     int i;
     int i_squared;
};

class A: private PrivateBaseForA
{
   mutable std::mutex mtx;
public:
   A(int _i = 0) { set(_i); }
   void set(int _i) {
     std::lock_guard<std::mutex> lock(mtx);
     i = _i;
     i_squared = _i*_i;
   }
   A(const A& other):
     PrivateBaseForA((std::lock_guard<std::mutex>(other.mtx),other))
   {}
};

Anthony makes slick use of double-parens to use the comma operator. If you wanted to avoid this, you could have PrivateBaseForA's constructor take a lock.

Again, this is not yet a very satisfying solution because writing a wrapper class for every synchronized class just to get a properly locked copy constructor is clumsy and intrusive.

Finally, the C++11 approach

Fortunately, C++11 offers a nice solution that really illustrates how beautifully and powerfully the features of C++11 work together. C++11 supports forwarding one constructor to another, which gives us an opportunity to grab the mutex before any copying or moving takes place:

class A {
private:
  A(const A &a, const std::lock_guard<std::mutex> &)
   : i(a.i), i_squared(a.i_squared) {}
public:
  A(const A &a) : A(a, std::lock_guard<std::mutex>(a.mtx)) {}
  ...
};

This solution locks the entire constructor, protects against races resulting from forgetting the manual lock, works with generic code, and doesn't require the creation of artificial wrapper classes.

While I think this is clearly the preferred solution, it is not perfect. To begin with, it is more obscure and guru-like than I would wish for such a common situation. Secondly, constructor forwarding is not yet implemented by any major C++ compiler (although the recently approved standard should change that soon). Finally, we lose the benefit of compiler generated copy constructors. If we added an i_cubed field to A that also needed to be kept consistent with i, we might forget to update the private constructor. Perhaps this will be a further opportunity for the next C++ standard (C++1y?). In the meantime, C++11 provides a powerful new solution to the everyday problem of writing thread-safe copy and move constructors.

One final note is to mention that although this article focused on copy constructors, everything applies equally to move constructors. Indeed, any constructor that needs to acquire any lock whatsoever (e.g., a database lock) for its entire duration can apply these techniques.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Simplify Code by Encapsulating Locks

Wednesday, 15 June 2011

Over on the Future Chips blog, Aater Suleman argues that while(1) can make parallel code better. Whilst I agree that the code using while(1) is simpler than the original in terms of analysing the lock patterns, it achieves this by testing the logical condition inside the while and using break. This is additional, unnecessary, complexity.

What is wanted instead is a way of encapsulating the locking, so that the loop logic is simple, and yet the lock logic is also clear.

Here is the original code from the blog post:

lock_acquire(lock);
while(check_condition()){
  lock_release(lock);
  //do any actual work in the iteration - Thanks to Caleb for this comment
  lock_acquire(lock);
}

lock_release(lock);

The implication here is that check_condition() must be called with the lock held, but the lock need not be held for the actual iteration work. The code thus acquires and releases the mutex in two places, which is unnecessary duplication, and a potential source of errors — if the loop exits early then the lock may be released twice, for example.

Rather than moving the condition check into the loop to avoid this duplication, a better solution is to move the lock acquisition and release into the condition check:

bool atomic_check_condition()
{
  lock_acquire(lock);
  bool result=check_condition();
  lock_release(lock);
  return result;
}

while(atomic_check_condition()){
  //do any actual work in the iteration - Thanks to Caleb for this comment
}

This gives us the best of both worlds: the lock is now held only across the check_condition() call, but the logic of the while loop is still clear.

If you're programming in C++, then the C++0x library allows us to make atomic_check_condition() even simpler by using lock_guard as in the code below, but extracting the function is always an improvement.

bool atomic_check_condition()
{
  std::lock_guard<mutex_type> guard(lock);
  return check_condition();
}

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Definitions of Non-blocking, Lock-free and Wait-free

Tuesday, 07 September 2010

There have repeatedly been posts on comp.programming.threads asking for a definition of these terms. To write good multithreaded code you really need to understand what these mean, and how they affect the behaviour and performance of algorithms with these properties. I thought it would be therefore be useful to provide some definitions.

Definition of Blocking

A function is said to be blocking if it calls an operating system function that waits for an event to occur or a time period to elapse. Whilst a blocking call is waiting the operating system can often remove that thread from the scheduler, so it takes no CPU time until the event has occurred or the time has elapsed. Once the event has occurred then the thread is placed back in the scheduler and can run when allocated a time slice. A thread that is running a blocking call is said to be blocked.

Mutex lock functions such as std::mutex::lock(), and EnterCriticalSection() are blocking, as are wait functions such as std::future::wait() and std::condition_variable::wait(). However, blocking functions are not limited to synchronization facilities: the most common blocking functions are I/O facilities such as fread() or WriteFile(). Timing facilities such as Sleep(), or std::this_thread::sleep_until() are also often blocking if the delay period is long enough.

Definition of Non-blocking

Non-blocking functions are just those that aren't blocking. Non-blocking data structures are those on which all operations are non-blocking. All lock-free data structures are inherently non-blocking.

Spin-locks are an example of non-blocking synchronization: if one thread has a lock then waiting threads are not suspended, but must instead loop until the thread that holds the lock has released it. Spin locks and other algorithms with busy-wait loops are not lock-free, because if one thread (the one holding the lock) is suspended then no thread can make progress.

Defintion of lock-free

A lock-free data structure is one that doesn't use any mutex locks. The implication is that multiple threads can access the data structure concurrently without race conditions or data corruption, even though there are no locks — people would give you funny looks if you suggested that std::list was a lock-free data structure, even though it is unlikely that there are any locks used in the implementation.

Just because more than one thread can safely access a lock-free data structure concurrently doesn't mean that there are no restrictions on such accesses. For example, a lock-free queue might allow one thread to add values to the back whilst another removes them from the front, whereas multiple threads adding new values concurrently would potentially corrupt the data structure. The data structure description will identify which combinations of operations can safely be called concurrently.

For a data structure to qualify as lock-free, if any thread performing an operation on the data structure is suspended at any point during that operation then the other threads accessing the data structure must still be able to complete their tasks. This is the fundamental restriction which distinguishes it from non-blocking data structures that use spin-locks or other busy-wait mechanisms.

Just because a data structure is lock-free it doesn't mean that threads don't have to wait for each other. If an operation takes more than one step then a thread may be pre-empted by the OS part-way through an operation. When it resumes the state may have changed, and the thread may have to restart the operation.

In some cases, a the partially-completed operation would prevent other threads performing their desired operations on the data structure until the operation is complete. In order for the algorithm to be lock-free, these threads must then either abort or complete the partially-completed operation of the suspended thread. When the suspended thread is woken by the scheduler it can then either retry or accept the completion of its operation as appropriate. In lock-free algorithms, a thread may find that it has to retry its operation an unbounded number of times when there is high contention.

If you use a lock-free data structure where multiple threads modify the same pieces of data and thus cause each other to retry then high rates of access from multiple threads can seriously cripple the performance, as the threads hinder each other's progress. This is why wait-free data structures are so important: they don't suffer from the same set-backs.

Definition of wait-free

A wait-free data structure is a lock-free data structure with the additional property that every thread accessing the data structure can make complete its operation within a bounded number of steps, regardless of the behaviour of other threads. Algorithms that can involve an unbounded number of retries due to clashes with other threads are thus not wait-free.

This property means that high-priority threads accessing the data structure never have to wait for low-priority threads to complete their operations on the data structure, and every thread will always be able to make progress when it is scheduled to run by the OS. For real-time or semi-real-time systems this can be an essential property, as the indefinite wait-periods of blocking or non-wait-free lock-free data structures do not allow their use within time-limited operations.

The downside of wait-free data structures is that they are more complex than their non-wait-free counterparts. This imposes an overhead on each operation, potentially making the average time taken to perform an operation considerably longer than the same operation on an equivalent non-wait-free data structure.

Choices

When choosing a data structure for a given task you need to think about the costs and benefits of each of the options.

A lock-based data structure is probably the easiest to use, reason about and write, but has the potential for limited concurrency. They may also be the fastest in low-load scenarios.

A lock-free (but not wait-free) data structure has the potential to allow more concurrent accesses, but with the possibility of busy-waits under high loads. Lock-free data structures are considerably harder to write, and the additional concurrency can make reasoning about the program behaviour harder. They may be faster than lock-based data structures, but not necessarily.

Finally, a wait-free data structure has the maximum potential for true concurrent access, without the possibility of busy waits. However, these are very much harder to write than other lock-free data structures, and typically impose an additional performance cost on every access.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Implementing Dekker's algorithm with Fences

Tuesday, 27 July 2010

Dekker's algorithm is one of the most basic algorithms for mutual exclusion, alongside Peterson's algorithm and Lamport's bakery algorithm. It has the nice property that it only requires load and store operations rather than exchange or test-and-set, but it still requires some level of ordering between the operations. On a weakly-ordered architecture such as PowerPC or SPARC, a correct implementation of Dekker's algorithm thus requires the use of fences or memory barriers in order to ensure correct operation.

The code

For those of you who just want the code: here it is — Dekker's algorithm in C++, with explicit fences.

std::atomic<bool> flag0(false),flag1(false);
std::atomic<int> turn(0);

void p0()
{
    flag0.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag1.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 0)
        {
            flag0.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 0)
            {
            }
            flag0.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);
 
    // critical section


    turn.store(1,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag0.store(false,std::memory_order_relaxed);
}

void p1()
{
    flag1.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag0.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 1)
        {
            flag1.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 1)
            {
            }
            flag1.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);
 
    // critical section


    turn.store(0,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag1.store(false,std::memory_order_relaxed);
}

The analysis

If you're like me then you'll be interested in why stuff works, rather than just taking the code. Here is my analysis of the required orderings, and how the fences guarantee those orderings.

Suppose thread 0 and thread 1 enter p0 and p1 respectively at the same time. They both set their respective flags to true, execute the fence and then read the other flag at the start of the while loop.

If both threads read false then both will enter the critical section, and the algorithm doesn't work. It is the job of the fences to ensure that this doesn't happen.

The fences are marked with memory_order_seq_cst, so either the fence in p0 is before the fence in p1 in the global ordering of memory_order_seq_cst operations, or vice-versa. Without loss of generality, we can assume that the fence in p0 comes before the fence in p1, since the code is symmetric. The store to flag0 is sequenced before the fence in p0, and the fence in p1 is sequenced before the read from flag0. Therefore the read from flag0 must see the value stored (true), so p1 will enter the while loop.

On the other side, there is no such guarantee for the read from flag1 in p0, so p0 may or may not enter the while loop. If p0 reads the value of false for flag1, it will not enter the while loop, and will instead enter the critical section, but that is OK since p1 has entered the while loop.

Though flag0 is not set to false until p0 has finished the critical section, we need to ensure that p1 does not see this until the values modified in the critical section are also visible to p1, and this is the purpose of the release fence prior to the store to flag0 and the acquire fence after the while loop. When p1 reads the value false from flag0 in order to exit the while loop, it must be reading the value store by p0 after the release fence at the end of the critical section. The acquire fence after the load guarantees that all the values written before the release fence prior to the store are visible, which is exactly what we need here.

If p0 reads true for flag1, it will enter the while loop rather than the critical section. Both threads are now looping, so we need a way to ensure that exactly one of them breaks out. This is the purpose of the turn variable. Initially, turn is 0, so p1 will enter the if and set flag1 to false, whilst p1 will not enter the if. Because p1 set flag1 to false, eventually p0 will read flag1 as false and exit the outer while loop to enter the critical section. On the other hand, p1 is now stuck in the inner while loop because turn is 0. When p0 exits the critical section it sets turn to 1. This will eventually be seen by p1, allowing it to exit the inner while loop. When the store to flag0 becomes visible p1 can then exit the outer while loop too.

If turn had been 1 initially (because p0 was the last thread to enter the critical section) then the inverse logic would apply, and p0 would enter the inner loop, allowing p1 to enter the critical section first.

Second time around

If p0 is called a second time whilst p1 is still in the inner loop then we have a similar situation to the start of the function — p1 may exit the inner loop and store true in flag1 whilst p0 stores true in flag0. We therefore need the second memory_order_seq_cst fence after the store to the flag in the inner loop. This guarantees that at least one of the threads will see the flag from the other thread set when it executes the check in the outer loop. Without this fence then both threads can read false, and both can enter the critical section.

Alternatives

You could put the ordering constraints on the loads and stores themselves rather than using fences. Indeed, the default memory ordering for atomic operations in C++ is memory_order_seq_cst, so the algorithm would "just work" with plain loads and stores to atomic variables. However, by using memory_order_relaxed on the loads and stores we can add fences to ensure we have exactly the ordering constraints required.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Multithreading in C++0x part 8: Futures, Promises and Asynchronous Function Calls

Thursday, 11 February 2010

This is the eighth in a series of blog posts introducing the new C++0x thread library. See the end of this article for a full set of links to the rest of the series.

In this installment we'll take a look at the "futures" mechanism from C++0x. Futures are a high level mechanism for passing a value between threads, and allow a thread to wait for a result to be available without having to manage the locks directly.

Futures and asynchronous function calls

The most basic use of a future is to hold the result of a call to the new std::async function for running some code asynchronously:

#include <future>
#include <iostream>

int calculate_the_answer_to_LtUaE();
void do_stuff();

int main()
{
    std::future<int> the_answer=std::async(calculate_the_answer_to_LtUaE);
    do_stuff();
    std::cout<<"The answer to life, the universe and everything is "
             <<the_answer.get()<<std::endl;
}

The call to std::async takes care of creating a thread, and invoking calculate_the_answer_to_LtUaE on that thread. The main thread can then get on with calling do_stuff() whilst the immensely time consuming process of calculating the ultimate answer is done in the background. Finally, the call to the get() member function of the std::future<int> object then waits for the function to complete and ensures that the necessary synchronization is applied to transfer the value over so the main thread can print "42".

Sometimes asynchronous functions aren't really asynchronous

Though I said that std::async takes care of creating a thread, that's not necessarily true. As well as the function being called, std::async takes a launch policy which specifies whether to start a new thread or create a "deferred function" which is only run when you wait for it. The default launch policy for std::async is std::launch::any, which means that the implementation gets to choose for you. If you really want to ensure that your function is run on its own thread then you need to specify the std::launch::async policy:

  std::future<int> the_answer=std::async(std::launch::async,calculate_the_answer_to_LtUaE);

Likewise, if you really want the function to be executed in the get() call then you can specify the std::launch::sync policy:

  std::future<int> the_answer=std::async(std::launch::sync,calculate_the_answer_to_LtUaE);

In most cases it makes sense to let the library choose. That way you'll avoid creating too many threads and overloading the machine, whilst taking advantage of the available hardware threads. If you need fine control, you're probably better off managing your own threads.

Divide and Conquer

std::async can be used to easily parallelize simple algorithms. For example, you can write a parallel version of for_each as follows:

template<typename Iterator,typename Func>
void parallel_for_each(Iterator first,Iterator last,Func f)
{
    ptrdiff_t const range_length=last-first;
    if(!range_length)
        return;
    if(range_length==1)
    {
        f(*first);
        return;
    }

    Iterator const mid=first+(range_length/2);

    std::future<void> bgtask=std::async(&parallel_for_each<Iterator,Func>,
                                        first,mid,f);
    try
    {
        parallel_for_each(mid,last,f);
    }
    catch(...)
    {
        bgtask.wait();
        throw;
    }
    bgtask.get();   
}

This simple bit of code recursively divides up the range into smaller and smaller pieces. Obviously an empty range doesn't require anything to happen, and a single-point range just requires calling f on the one and only value. For bigger ranges then an asynchronous task is spawned to handle the first half, and then the second half is handled by a recursive call.

The try - catch block just ensures that the asynchronous task is finished before we leave the function even if an exception in order to avoid the background tasks potentially accessing the range after it has been destroyed. Finally, the get() call waits for the background task, and propagates any exception thrown from the background task. That way if an exception is thrown during any of the processing then the calling code will see an exception. Of course if more than one exception is thrown then some will get swallowed, but C++ can only handle one exception at a time, so that's the best that can be done without using a custom composite_exception class to collect them all.

Many algorithms can be readily parallelized this way, though you may want to have more than one element as the minimum range in order to avoid the overhead of spawning the asynchronous tasks.

Promises

An alternative to using std::async to spawn the task and return the future is to manage the threads yourself and use the std::promise class template to provide the future. Promises provide a basic mechanism for transferring values between threads: each std::promise object is associated with a single std::future object. A thread with access to the std::future object can use wait for the result to be set, whilst another thread that has access to the corresponding std::promise object can call set_value() to store the value and make the future ready. This works well if the thread has more than one task to do, as information can be made ready to other threads as it becomes available rather than all of them having to wait until the thread doing the work has completed. It also allows for situations where multiple threads could produce the answer: from the point of view of the waiting thread it doesn't matter where the answer came from, just that it is there so it makes sense to have a single future to represent that availability.

For example, asynchronous I/O could be modelled on a promise/future basis: when you submit an I/O request then the async I/O handler creates a promise/future pair. The future is returned to the caller, which can then wait on the future when it needs the data, and the promise is stored alongside the details of the request. When the request has been fulfilled then the I/O thread can set the value on the promise to pass the value back to the waiting thread before moving on to process additional requests. The following code shows a sample implementation of this pattern.

class aio
{
    class io_request
    {
        std::streambuf* is;
        unsigned read_count;
        std::promise<std::vector<char> > p;
    public:
        explicit io_request(std::streambuf& is_,unsigned count_):
            is(&is_),read_count(count_)
        {}
    
        io_request(io_request&& other):
            is(other.is),
            read_count(other.read_count),
            p(std::move(other.p))
        {}

        io_request():
            is(0),read_count(0)
        {}

        std::future<std::vector<char> > get_future()
        {
            return p.get_future();
        }

        void process()
        {
            try
            {
                std::vector<char> buffer(read_count);

                unsigned amount_read=0;
                while((amount_read != read_count) && 
                      (is->sgetc()!=std::char_traits<char>::eof()))
                {
                    amount_read+=is->sgetn(&buffer[amount_read],read_count-amount_read);
                }

                buffer.resize(amount_read);
                
                p.set_value(std::move(buffer));
            }
            catch(...)
            {
                p.set_exception(std::current_exception());
            }
        }
    };

    thread_safe_queue<io_request> request_queue;
    std::atomic_bool done;

    void io_thread()
    {
        while(!done)
        {
            io_request req=request_queue.pop();
            req.process();
        }
    }

    std::thread iot;
    
public:
    aio():
        done(false),
        iot(&aio::io_thread,this)
    {}

    std::future<std::vector<char> > queue_read(std::streambuf& is,unsigned count)
    {
        io_request req(is,count);
        std::future<std::vector<char> > f(req.get_future());
        request_queue.push(std::move(req));
        return f;
    }
    
    ~aio()
    {
        done=true;
        request_queue.push(io_request());
        iot.join();
    }
};

void do_stuff()
{}

void process_data(std::vector<char> v)
{
    for(unsigned i=0;i<v.size();++i)
    {
        std::cout<<v[i];
    }
    std::cout<<std::endl;
} 

int main()
{
    aio async_io;

    std::filebuf f;
    f.open("my_file.dat",std::ios::in | std::ios::binary);

    std::future<std::vector<char> > fv=async_io.queue_read(f,1048576);
    
    do_stuff();
    process_data(fv.get());
    
    return 0;
}

Next Time

The sample code above also demonstrates passing exceptions between threads using the set_exception() member function of std::promise. I'll go into more detail about exceptions in multithreaded next time.

Subscribe to the RSS feed RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

If you're using Microsoft Visual Studio 2008 or g++ 4.3 or 4.4 on Ubuntu Linux you can try out the examples from this series using our just::thread implementation of the new C++0x thread library. Get your copy today.

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Multithreading in C++0x part 7: Locking multiple mutexes without deadlock

Friday, 21 August 2009

This is the seventh in a series of blog posts introducing the new C++0x thread library. So far we've looked at the various ways of starting threads in C++0x and protecting shared data with mutexes. See the end of this article for a full set of links to the rest of the series.

After last time's detour into lazy initialization, our regular programming resumes with the promised article on std::lock().

Acquiring locks on multiple mutexes

In most cases, your code should only ever hold a lock on one mutex at a time. Occasionally, you might nest your locks, e.g. by calling into a subsystem that protects its internal data with a mutex whilst holding a lock on another mutex, but it's generally better to avoid holding locks on multiple mutexes at once if at all possible.

However, sometimes it is necessary to hold a lock on more than one mutex, because you need to perform an operation on two distinct items of data, each of which is protected by its own mutex. In order to perform the operation correctly, you need to hold locks on both the mutexes, otherwise another thread could change one of the values before your operation was complete. For example, consider a bank transfer between two accounts: we want the transfer to be a single action to the outside world, so we need to acquire the lock on the mutex protecting each account. You could naively implement this like so:

class account
{
    std::mutex m;
    currency_value balance;
public:

    friend void transfer(account& from,account& to,
                         currency_value amount)
    {
        std::lock_guard<std::mutex> lock_from(from.m);
        std::lock_guard<std::mutex> lock_to(to.m);
        from.balance -= amount;
        to.balance += amount;
    }
};

Though it looks right at first glance (both locks are acquired before data is accessed), there is a potential for deadlock in this code. Consider two threads calling transfer() at the same time on the same accounts, but one thread is transferring money from account A to account B, whereas the other is transferring money from account B to account A. If the calls to transfer() are running concurrently, then both threads will try and lock the mutex on their from account. Assuming there's nothing else happening in the system, this will succeed, as for thread 1 the from account is account A, whereas for thread 2 the from account is account B. Now each thread tries to acquire the lock on the mutex for the corresponding to. Thread 1 will try and lock the mutex for account B (thread 1 is transferring from A to B). Unfortunately, it will block because thread 2 holds that lock. Meanwhile thread B will try and lock the mutex for account A (thread 2 is transferring from B to A). Thread 1 holds this mutex, so thread 2 must also block — deadlock.

Avoiding deadlock with std::lock()

In order to avoid this problem, you need to somehow tell the system to lock the two mutexes together, so that one of the threads acquires both locks and deadlock is avoided. This is what the std::lock() function is for — you supply a number of mutexes (it's a variadic template) and the thread library ensures that when the function returns they are all locked. Thus we can avoid the race condition in transfer() by writing it as follows:

void transfer(account& from,account& to,
              currency_value amount)
{
    std::lock(from.m,to.m);
    std::lock_guard<std::mutex> lock_from(from.m,std::adopt_lock);
    std::lock_guard<std::mutex> lock_to(to.m,std::adopt_lock);
    from.balance -= amount;
    to.balance += amount;
}

Here we use std::lock() to lock both mutexes safely, then adopt the ownership into the std::lock_guard instances to ensure the locks are released safely at the end of the function.

Other mutex types

As mentioned already, std::lock() is a function template rather than a plain function. Not only does this mean it can merrily accept any number of mutex arguments, but it also means that it can accept any type of mutex arguments. The arguments don't even all have to be the same type. You can pass anything which implements lock(), try_lock() and unlock() member functions with appropriate semantics. As you may remember from part 5 of this series, std::unique_lock<> provides these member functions, so you can pass an instance of std::unique_lock<> to std::lock(). Indeed, you could also write transfer() using std::unique_lock<> like this:

void transfer(account& from,account& to,
              currency_value amount)
{
    std::unique_lock<std::mutex> lock_from(from.m,std::defer_lock);
    std::unique_lock<std::mutex> lock_to(to.m,std::defer_lock);
    std::lock(lock_from,lock_to);
    from.balance -= amount;
    to.balance += amount;
}

In this case, we construct the std::unique_lock<> instances without actually locking the mutexes, and then lock them both together with std::lock() afterwards. You still get all the benefits of the deadlock avoidance, and the same level of exception safety — which approach to use is up to you, and depends on what else is happening in the code.

Exception safety

Since std::lock() has to work with any mutex type you might throw at it, including user-defined ones, it has to be able to cope with exceptions. In particular, it has to provide sensible behaviour if a call to lock() or try_lock() on one of the supplied mutexes throws an exception. The way it does this is quite simple: if a call to lock() or try_lock() on one of the supplied mutexes throws an exception then unlock() is called on each of the mutexes for which this call to std::lock() currently owns a lock. So, if you are locking 4 mutexes and the call has successfully acquired 2 of them, but a call to try_lock() on the third throws an exception then the locks on the first two will be released by calls to unlock().

The upshot of this is that if std::lock() completes successfully then the calling thread owns the lock on all the supplied mutexes, but if the call exits with an exception then from the point of view of lock ownership it will be as-if the call was never made — any additional locks acquired have been released again.

No silver bullet

There are many ways to write code that deadlocks: std::lock() only addresses the particular case of acquiring multiple mutexes together. However, if you need to do this then std::lock() ensures that you don't need to worry about lock ordering issues.

If your code acquires locks on multiple mutexes, but std::lock() isn't applicable for your case, then you need to take care of lock ordering issues another way. One possibility is to enforce an ordering by using a hierarchical mutex. If you've got a deadlock in your code, then things like the deadlock detection mode of my just::thread library can help you pinpoint the cause.

Next time

In the next installment we'll take a look at the "futures" mechanism from C++0x. Futures are a high level mechanism for passing a value between threads, and allow a thread to wait for a result to be available without having to manage the locks directly.

Subscribe to the RSS feed RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

If you're using Microsoft Visual Studio 2008 or g++ 4.3 or 4.4 on Ubuntu Linux you can try out the examples from this series using our just::thread implementation of the new C++0x thread library. Get your copy today.

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Multithreading in C++0x part 6: Lazy initialization and double-checked locking with atomics

Thursday, 13 August 2009

This is the sixth in a series of blog posts introducing the new C++0x thread library. So far we've looked at the various ways of starting threads in C++0x and protecting shared data with mutexes. See the end of this article for a full set of links to the rest of the series.

I had intended to write about the use of the new std::lock() for avoiding deadlock in this article. However, there was a post on comp.std.c++ this morning about lazy initialization, and I thought I'd write about that instead. std::lock() can wait until next time.

Lazy Initialization

The classic lazy-initialization case is where you have an object that is expensive to construct but isn't always needed. In this case you can choose to only initialize it on first use:

class lazy_init
{
    mutable std::unique_ptr<expensive_data> data;
public:
    expensive_data const& get_data() const
    {
        if(!data)
        {
            data.reset(new expensive_data);
        }
        return *data;
    }
};

However, we can't use this idiom in multi-threaded code, since there would be a data race on the accesses to data. Enter std::call_once() — by using an instance of std::once_flag to protect the initialization we can make the data race go away:

class lazy_init
{
    mutable std::once_flag flag;
    mutable std::unique_ptr<expensive_data> data;

    void do_init() const
    {
        data.reset(new expensive_data);
    }
public:
    expensive_data const& get_data() const
    {
        std::call_once(flag,&lazy_init::do_init,this);
        return *data;
    }
};

Concurrent calls to get_data() are now safe: if the data has already been initialized they can just proceed concurrently. If not, then all threads calling concurrently except one will wait until the remaining thread has completed the initialization.

Reinitialization

This is all very well if you only want to initialize the data once. However, what if you need to update the data — perhaps it's a cache of some rarely-changing data that's expensive to come by. std::call_once() doesn't support multiple calls (hence the name). You could of course protect the data with a mutex, as shown below:

class lazy_init_with_cache
{
    mutable std::mutex m;
    mutable std::shared_ptr<const expensive_data> data;

public:
    std::shared_ptr<const expensive_data> get_data() const
    {
        std::lock_guard<std::mutex> lk(m);
        if(!data)
        {
            data.reset(new expensive_data);
        }
        return data;
    }
    void invalidate_cache()
    {
        std::lock_guard<std::mutex> lk(m);
        data.reset();
    }
};

Note that in this case we return a std::shared_ptr<const expensive_data> rather than a reference to avoid a race condition on the data itself — this ensures that the copy held by the calling code will be valid (if out of date) even if another thread calls invalidate_cache() before the data can be used.

This "works" in the sense that it avoids data races, but if the updates are rare and the reads are frequent then this may cause unnecessary serialization when multiple threads call get_data() concurrently. What other options do we have?

Double-checked locking returns

Much has been written about how double-checked locking is broken when using multiple threads. However, the chief cause of the problem is that the sample code uses plain non-atomic operations to check the flag outside the mutex, so is subject to a data race. You can overcome this by careful use of the C++0x atomics, as shown in the example below:

class lazy_init_with_cache
{
    mutable std::mutex m;
    mutable std::shared_ptr<const expensive_data> data;

public:
    std::shared_ptr<const expensive_data> get_data() const
    {
        std::shared_ptr<const expensive_data> result=
            std::atomic_load_explicit(&data,std::memory_order_acquire);
        if(!result)
        {
            std::lock_guard<std::mutex> lk(m);
            result=data;
            if(!result)
            {
                result.reset(new expensive_data);
                std::atomic_store_explicit(&data,result,std::memory_order_release);
            }
        }
        return result;
    }
    void invalidate_cache()
    {
        std::lock_guard<std::mutex> lk(m);
        std::shared_ptr<const expensive_data> dummy;
        std::atomic_store_explicit(&data,dummy,std::memory_order_relaxed);
    }
};

Note that in this case, all writes to data use atomic operations, even those within the mutex lock. This is necessary in order to ensure that the atomic load operation at the start of get_data() actually has a coherent value to read — there's no point doing an atomic load if the stores are not atomic, otherwise you might atomically load some half-written data. Also, the atomic load and store operations ensure that the reference count on the std::shared_ptr object is correctly updated, so that the expensive_data object is correctly destroyed when the last std::shared_ptr object referencing it is destroyed.

If our atomic load actually returned a non-NULL value then we can use that, just as we did before. However, if it returned NULL then we need to lock the mutex and try again. This time we can use a plain read of data, since the mutex is locked. If we still get NULL then we need to do the initialization. However, we can't just call data.reset() like before, since that is not atomic. Instead we must create a local std::shared_ptr instance with the value we want, and store the value with an atomic store operation. We can use result for the local value, since we want the value in that variable anyway.

In invalidate_cache() we must also store the value using std::atomic_store_explicit(), in order to ensure that the NULL value is correctly read in get_data(). Note also that we must also lock the mutex here, in order to avoid a data race with the initialization code inside the mutex lock in get_data().

Memory ordering

By using std::atomic_load_explicit() and std::atomic_store_explicit() we can specify the memory ordering requirements of the operations. We could have just used std::atomic_load() and std::atomic_store(), but those would have implied std::memory_order_seq_cst, which is overkill in this scenario. What we need is to ensure that if a non-NULL value is read in get_data() then the actual creation of the associated object happens-before the read. The store in get_data() must therefore use std::memory_order_release, whilst the load uses std::memory_order_acquire.

On the other hand, the store in invalidate_cache() can merrily use std::memory_order_relaxed, since there is no data associated with the store: if the load in get_data() reads NULL then the mutex will be locked, which will handle any necessary synchronization.

Whenever you use atomic operations, you have to make sure that the memory ordering is correct, and that there are no races. Even in such a simple case such as this it is not trivial, and I would not recommend it unless profiling has shown that this is really a problem.

Update: As if to highlight my previous point about the trickiness of atomic operations, Dmitriy correctly points out in the comments that the use of std::shared_ptr to access the expensive_data implies a reference count, which is a real performance suck in multithreaded code. Whatever the memory ordering constraints we put on it, every thread doing the reading has to update the reference count. This is thus a source of contention, and can seriously limit scalability even if it doesn't force full serialization. The same issues apply to multiple-reader single-writer mutexes — Joe Duffy has written about them over on his blog. Time it on your platform with just a mutex (i.e. no double-checked locking), and with the atomic operations, and use whichever is faster. Alternatively, use a memory reclamation scheme specially tailored for your usage.

Next time

In the next part of this series I'll cover the use of std::lock() that I was intending to cover in this installment.

Subscribe to the RSS feed RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

If you're using Microsoft Visual Studio 2008 or g++ 4.3 or 4.4 on Ubuntu Linux you can try out the examples from this series using our just::thread implementation of the new C++0x thread library. Get your copy today.

Note: since std::shared_ptr is part of the library supplied with the compiler, just::thread cannot provide the atomic functions for std::shared_ptr used in this article.

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Multithreading in C++0x part 5: Flexible locking with std::unique_lock<>

Wednesday, 15 July 2009

This is the fifth in a series of blog posts introducing the new C++0x thread library. So far we've looked at the various ways of starting threads in C++0x and protecting shared data with mutexes. See the end of this article for a full set of links to the rest of the series.

In the previous installment we looked at the use of std::lock_guard<> to simplify the locking and unlocking of a mutex and provide exception safety. This time we're going to look at the std::lock_guard<>'s companion class template std::unique_lock<>. At the most basic level you use it like std::lock_guard<> — pass a mutex to the constructor to acquire a lock, and the mutex is unlocked in the destructor — but if that's all you're doing then you really ought to use std::lock_guard<> instead. The benefit to using std::unique_lock<> comes from two things:

  1. you can transfer ownership of the lock between instances, and
  2. the std::unique_lock<> object does not have to own the lock on the mutex it is associated with.

Let's take a look at each of these in turn, starting with transferring ownership.

Transferring ownership of a mutex lock between std::unique_lock<> instances

There are several consequences to being able to transfer ownership of a mutex lock between std::unique_lock<> instances: you can return a lock from a function, you can store locks in standard containers, and so forth.

For example, you can write a simple function that acquires a lock on an internal mutex:

std::unique_lock<std::mutex> acquire_lock()
{
    static std::mutex m;
    return std::unique_lock<std::mutex>(m);
}

The ability to transfer lock ownership between instances also provides an easy way to write classes that are themselves movable, but hold a lock internally, such as the following:

class data_to_protect
{
public:
    void some_operation();
    void other_operation();
};

class data_handle
{
private:
    data_to_protect* ptr;
    std::unique_lock<std::mutex> lk;

    friend data_handle lock_data();

    data_handle(data_to_protect* ptr_,std::unique_lock<std::mutex> lk_):
        ptr(ptr_),lk(lk_)
    {}
public:
    data_handle(data_handle && other):
        ptr(other.ptr),lk(std::move(other.lk))
    {}
    data_handle& operator=(data_handle && other)
    {
        if(&other != this)
        {
            ptr=other.ptr;
            lk=std::move(other.lk);
            other.ptr=0;
        }
        return *this;
    }
    void do_op()
    {
        ptr->some_operation();
    }
    void do_other_op()
    {
        ptr->other_operation();
    }
};

data_handle lock_data()
{
    static std::mutex m;
    static data_to_protect the_data;
    std::unique_lock<std::mutex> lk(m);
    return data_handle(&the_data,std::move(lk));
}

int main()
{
    data_handle dh=lock_data(); // lock acquired
    dh.do_op();                 // lock still held
    dh.do_other_op();           // lock still held
    data_handle dh2;
    dh2=std::move(dh);          // transfer lock to other handle
    dh2.do_op();                // lock still held
}                               // lock released

In this case, the function lock_data() acquires a lock on the mutex used to protect the data, and then transfers that along with a pointer to the data into the data_handle. This lock is then held by the data_handle until the handle is destroyed, allowing multiple operations to be done on the data without the lock being released. Because the std::unique_lock<> is movable, it is easy to make data_handle movable too, which is necessary to return it from lock_data.

Though the ability to transfer ownership between instances is useful, it is by no means as useful as the simple ability to be able to manage the ownership of the lock separately from the lifetime of the std::unique_lock<> instance.

Explicit locking and unlocking a mutex with a std::unique_lock<>

As we saw in part 4 of this series, std::lock_guard<> is very strict on lock ownership — it owns the lock from construction to destruction, with no room for manoeuvre. std::unique_lock<> is rather lax in comparison. As well as acquiring a lock in the constructor as for std::lock_guard<>, you can:

As you can see, std::unique_lock<> is quite flexible: it gives you complete control over the underlying mutex, and actually meets all the requirements for a Lockable object itself. You can thus have a std::unique_lock<std::unique_lock<std::mutex>> if you really want to! However, even with all this flexibility it still gives you exception safety: if the lock is held when the object is destroyed, it is released in the destructor.

std::unique_lock<> and condition variables

One place where the flexibility of std::unique_lock<> is used is with std::condition_variable. std::condition_variable provides an implementation of a condition variable, which allows a thread to wait until it has been notified that a certain condition is true. When waiting you must pass in a std::unique_lock<> instance that owns a lock on the mutex protecting the data related to the condition. The condition variable uses the flexibility of std::unique_lock<> to unlock the mutex whilst it is waiting, and then lock it again before returning to the caller. This enables other threads to access the protected data whilst the thread is blocked. I will expand upon this in a later part of the series.

Other uses for flexible locking

The key benefit of the flexible locking is that the lifetime of the lock object is independent from the time over which the lock is held. This means that you can unlock the mutex before the end of a function is reached if certain conditions are met, or unlock it whilst a time-consuming operation is performed (such as waiting on a condition variable as described above) and then lock the mutex again once the time-consuming operation is complete. Both these choices are embodiments of the common advice to hold a lock for the minimum length of time possible without sacrificing exception safety when the lock is held, and without having to write convoluted code to get the lifetime of the lock object to match the time for which the lock is required.

For example, in the following code snippet the mutex is unlocked across the time-consuming load_strings() operation, even though it must be held either side to access the strings_to_process variable:

std::mutex m;
std::vector<std::string> strings_to_process;

void update_strings()
{
    std::unique_lock<std::mutex> lk(m);
    if(strings_to_process.empty())
    {
        lk.unlock();
        std::vector<std::string> local_strings=load_strings();
        lk.lock();
        strings_to_process.insert(strings_to_process.end(),
                                  local_strings.begin(),local_strings.end());
    }
}

Next time

Next time we'll look at the use of the new std::lock() and std::try_lock()function templates to avoid deadlock when acquiring locks on multiple mutexes.

Subscribe to the RSS feed RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

If you're using Microsoft Visual Studio 2008 or g++ 4.3 or 4.4 on Ubuntu Linux you can try out the examples from this series using our just::thread implementation of the new C++0x thread library. Get your copy today.

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

Older entries