Just Software Solutions

The Intel x86 Memory Ordering Guarantees and the C++ Memory Model

Tuesday, 26 August 2008

The July 2008 version of the Intel 64 and IA-32 Architecture documents includes the information from the memory ordering white paper I mentioned before. This makes it clear that on x86/x64 systems the preferred implementation of the C++0x atomic operations is as follows (which has been confirmed in discussions with Intel engineers):

Memory OrderingStoreLoad
std::memory_order_relaxedMOV [mem],regMOV reg,[mem]
std::memory_order_acquiren/aMOV reg,[mem]
std::memory_order_releaseMOV [mem],regn/a
std::memory_order_seq_cstXCHG [mem],regMOV reg,[mem]

As you can see, plain MOV is enough for even sequentially-consistent loads if a LOCKed instruction such as XCHG is used for the sequentially-consistent stores.

One thing to watch out for is the Non-Temporal SSE instructions (MOVNTI, MOVNTQ, etc.), which by their very nature (i.e. non-temporal) don't follow the normal cache-coherency rules. Therefore non-temporal stores must be followed by an SFENCE instruction in order for their results to be seen by other processors in a timely fashion.

Additionally, if you're writing drivers which deal with memory pages marked WC (Write-Combining) then additional fence instructions will be required to ensure visibility between processors. However, if you're programming with WC pages then this shouldn't be a problem.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

3 Comments

Anthony,

I knew an exceptional case "Loads do actually reordered with other loads, if store to load forwarding is involved" from following URL.

http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/62973/

Will this affect your memory_order implementation?

by James Gan at 01:09:10 on Tuesday, 24 February 2009

Hi James,

Short answer: no.

Reading a value written by your own thread doesn't provide any additional ordering, so in the code from the first post on that forum page, the read of guard0 into dummy is essentially a no-op. If the read from guard0 was tested, and the value was NOT what was written then you would know that another thread had modified the value. In the code, guard0 is not written by another thread so this cannot happen.

by Anthony Williams at 10:26:46 on Tuesday, 24 February 2009

Do you happen to have a link to a proof that a plain mov is all that's needed for a sequentially-consistent load? I see from the memory model documents that the xchg'es all happen in a total order, and that movs won't be reordered across xchg, but I'm having trouble getting from there to sequential consistency.

by Jeffrey Yasskin at 06:23:51 on Wednesday, 19 May 2010

Add your comment

Your name:

Email address:

Person or spambot?

Your comment:

Design and Content Copyright © 2005-2017 Just Software Solutions Ltd. All rights reserved.