Multithreading in C++0x part 8: Futures, Promises and Asynchronous Function Calls

Thursday, 11 February 2010

This is the eighth in a series of blog posts introducing the new C++0x thread library. See the end of this article for a full set of links to the rest of the series.

In this installment we'll take a look at the "futures" mechanism from C++0x. Futures are a high level mechanism for passing a value between threads, and allow a thread to wait for a result to be available without having to manage the locks directly.

Futures and asynchronous function calls

The most basic use of a future is to hold the result of a call to the new std::async function for running some code asynchronously:

#include <future>
#include <iostream>

int calculate_the_answer_to_LtUaE();
void do_stuff();

int main()
{
    std::future<int> the_answer=std::async(calculate_the_answer_to_LtUaE);
    do_stuff();
    std::cout<<"The answer to life, the universe and everything is "
             <<the_answer.get()<<std::endl;
}

The call to std::async takes care of creating a thread, and invoking calculate_the_answer_to_LtUaE on that thread. The main thread can then get on with calling do_stuff() whilst the immensely time consuming process of calculating the ultimate answer is done in the background. Finally, the call to the get() member function of the std::future<int> object then waits for the function to complete and ensures that the necessary synchronization is applied to transfer the value over so the main thread can print "42".

Sometimes asynchronous functions aren't really asynchronous

Though I said that std::async takes care of creating a thread, that's not necessarily true. As well as the function being called, std::async takes a launch policy which specifies whether to start a new thread or create a "deferred function" which is only run when you wait for it. The default launch policy for std::async is std::launch::any, which means that the implementation gets to choose for you. If you really want to ensure that your function is run on its own thread then you need to specify the std::launch::async policy:

  std::future<int> the_answer=std::async(std::launch::async,calculate_the_answer_to_LtUaE);

Likewise, if you really want the function to be executed in the get() call then you can specify the std::launch::sync policy:

  std::future<int> the_answer=std::async(std::launch::sync,calculate_the_answer_to_LtUaE);

In most cases it makes sense to let the library choose. That way you'll avoid creating too many threads and overloading the machine, whilst taking advantage of the available hardware threads. If you need fine control, you're probably better off managing your own threads.

Divide and Conquer

std::async can be used to easily parallelize simple algorithms. For example, you can write a parallel version of for_each as follows:

template<typename Iterator,typename Func>
void parallel_for_each(Iterator first,Iterator last,Func f)
{
    ptrdiff_t const range_length=last-first;
    if(!range_length)
        return;
    if(range_length==1)
    {
        f(*first);
        return;
    }

    Iterator const mid=first+(range_length/2);

    std::future<void> bgtask=std::async(&parallel_for_each<Iterator,Func>,
                                        first,mid,f);
    try
    {
        parallel_for_each(mid,last,f);
    }
    catch(...)
    {
        bgtask.wait();
        throw;
    }
    bgtask.get();   
}

This simple bit of code recursively divides up the range into smaller and smaller pieces. Obviously an empty range doesn't require anything to happen, and a single-point range just requires calling f on the one and only value. For bigger ranges then an asynchronous task is spawned to handle the first half, and then the second half is handled by a recursive call.

The try - catch block just ensures that the asynchronous task is finished before we leave the function even if an exception in order to avoid the background tasks potentially accessing the range after it has been destroyed. Finally, the get() call waits for the background task, and propagates any exception thrown from the background task. That way if an exception is thrown during any of the processing then the calling code will see an exception. Of course if more than one exception is thrown then some will get swallowed, but C++ can only handle one exception at a time, so that's the best that can be done without using a custom composite_exception class to collect them all.

Many algorithms can be readily parallelized this way, though you may want to have more than one element as the minimum range in order to avoid the overhead of spawning the asynchronous tasks.

Promises

An alternative to using std::async to spawn the task and return the future is to manage the threads yourself and use the std::promise class template to provide the future. Promises provide a basic mechanism for transferring values between threads: each std::promise object is associated with a single std::future object. A thread with access to the std::future object can use wait for the result to be set, whilst another thread that has access to the corresponding std::promise object can call set_value() to store the value and make the future ready. This works well if the thread has more than one task to do, as information can be made ready to other threads as it becomes available rather than all of them having to wait until the thread doing the work has completed. It also allows for situations where multiple threads could produce the answer: from the point of view of the waiting thread it doesn't matter where the answer came from, just that it is there so it makes sense to have a single future to represent that availability.

For example, asynchronous I/O could be modelled on a promise/future basis: when you submit an I/O request then the async I/O handler creates a promise/future pair. The future is returned to the caller, which can then wait on the future when it needs the data, and the promise is stored alongside the details of the request. When the request has been fulfilled then the I/O thread can set the value on the promise to pass the value back to the waiting thread before moving on to process additional requests. The following code shows a sample implementation of this pattern.

class aio
{
    class io_request
    {
        std::streambuf* is;
        unsigned read_count;
        std::promise<std::vector<char> > p;
    public:
        explicit io_request(std::streambuf& is_,unsigned count_):
            is(&is_),read_count(count_)
        {}
    
        io_request(io_request&& other):
            is(other.is),
            read_count(other.read_count),
            p(std::move(other.p))
        {}

        io_request():
            is(0),read_count(0)
        {}

        std::future<std::vector<char> > get_future()
        {
            return p.get_future();
        }

        void process()
        {
            try
            {
                std::vector<char> buffer(read_count);

                unsigned amount_read=0;
                while((amount_read != read_count) && 
                      (is->sgetc()!=std::char_traits<char>::eof()))
                {
                    amount_read+=is->sgetn(&buffer[amount_read],read_count-amount_read);
                }

                buffer.resize(amount_read);
                
                p.set_value(std::move(buffer));
            }
            catch(...)
            {
                p.set_exception(std::current_exception());
            }
        }
    };

    thread_safe_queue<io_request> request_queue;
    std::atomic_bool done;

    void io_thread()
    {
        while(!done)
        {
            io_request req=request_queue.pop();
            req.process();
        }
    }

    std::thread iot;
    
public:
    aio():
        done(false),
        iot(&aio::io_thread,this)
    {}

    std::future<std::vector<char> > queue_read(std::streambuf& is,unsigned count)
    {
        io_request req(is,count);
        std::future<std::vector<char> > f(req.get_future());
        request_queue.push(std::move(req));
        return f;
    }
    
    ~aio()
    {
        done=true;
        request_queue.push(io_request());
        iot.join();
    }
};

void do_stuff()
{}

void process_data(std::vector<char> v)
{
    for(unsigned i=0;i<v.size();++i)
    {
        std::cout<<v[i];
    }
    std::cout<<std::endl;
} 

int main()
{
    aio async_io;

    std::filebuf f;
    f.open("my_file.dat",std::ios::in | std::ios::binary);

    std::future<std::vector<char> > fv=async_io.queue_read(f,1048576);
    
    do_stuff();
    process_data(fv.get());
    
    return 0;
}

Next Time

The sample code above also demonstrates passing exceptions between threads using the set_exception() member function of std::promise. I'll go into more detail about exceptions in multithreaded next time.

Subscribe to the RSS feed RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

If you're using Microsoft Visual Studio 2008 or g++ 4.3 or 4.4 on Ubuntu Linux you can try out the examples from this series using our just::thread implementation of the new C++0x thread library. Get your copy today.

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the right.

6 Comments

This is a great series. Thank you. I have one suggestion: could you place the code comments in the code itself? That would be very handy.

by saurabh at 15:03:11 on Saturday, 13 February 2010

I think this article is my favorite in your series ;) Thanks! Future concept is _very_ useful.

One question though regarding your futures implementation recently added into boost. How would I find out the return result of the future in a generic way? Why not having 'result_type' typedef? There is a 'reference' typedef both in unique_future and shared_future classes but it's not public...

by Pavel Shevaev at 14:28:36 on Wednesday, 17 February 2010

You don't need 'explicit' in ctors that have more than one arg, do you? I thought that was just to prevent conversions.

by Holly Hopdrive at 13:48:26 on Sunday, 21 March 2010
This is good stuff of reading. Pegasys is a leading software development company offering Offshore Software Development Services & solutions with their vast experience and expertise in Application Development, Web Development, E-Strategy Consulting, Ecommerce solutions, Web Application development, Multimedia and Design Solutions, Wireless Technologies and so on. Companies are becoming software dependent, developing & capitalizing on strong software capabilities. Thanking you. <a href="http://www.pegasyssoft.com/services-solutions/custom-sw-development.html">Custom Software Development</a>
by Custom Software Development at 10:41:41 on Tuesday, 23 March 2010

Great tutorial !

Just to be sure : It is the library that decides if a function will be "asynchronized" or not, that means at compilation time, is that correct ? So in the Divide and Conquer example, will there be one thread for each element of the container (as the compiler can't infer the number of cores on the machine) ? or will the number of threads will be decided at runtime depending on the availability of the cores of the machine (and multiple elements of the container are treated on a same shared thread) ?

by Sant Kadog at 00:06:52 on Thursday, 17 May 2012

RE: Sometimes asynchronous functions aren't really asynchronous

You are confused. Asynchronous is just that. Not to be confused with in parallel. Asynchronous means it gets done when it gets done. Period.

While your examples may be great for learning, they are a terible practice to get into.

You should consider that creating a new thread consumes some stack, on windows 32 the default is about 5 megs. The stack of course must undergo a stack wipe before it can be used. This makes a simple operation take longer to perform than just doing it all synchronously. You should avoid creating more threads than you have the hardware to run.

Cheers,

Dan -

by Dan at 17:58:15 on Monday, 16 July 2012

Add your comment

Your name:

Your URL:

Email address:

Person or spambot?

Your comment: