The Intel x86 Memory Ordering Guarantees and the C++ Memory Model
Tuesday, 26 August 2008
The July 2008 version of the Intel 64 and IA-32 Architecture documents includes the information from the memory ordering white paper I mentioned before. This makes it clear that on x86/x64 systems the preferred implementation of the C++0x atomic operations is as follows (which has been confirmed in discussions with Intel engineers):
| Memory Ordering | Store | Load |
|---|---|---|
| std::memory_order_relaxed | MOV [mem],reg | MOV reg,[mem] |
| std::memory_order_acquire | n/a | MOV reg,[mem] |
| std::memory_order_release | MOV [mem],reg | n/a |
| std::memory_order_seq_cst | XCHG [mem],reg | MOV reg,[mem] |
As you can see, plain MOV is enough for even
sequentially-consistent loads if a LOCKed instruction
such as XCHG is used for the sequentially-consistent
stores.
One thing to watch out for is the Non-Temporal SSE instructions
(MOVNTI, MOVNTQ, etc.), which by their
very nature (i.e. non-temporal) don't follow the normal
cache-coherency rules. Therefore non-temporal stores must be
followed by an SFENCE instruction in order for their
results to be seen by other processors in a timely fashion.
Additionally, if you're writing drivers which deal with memory pages marked WC (Write-Combining) then additional fence instructions will be required to ensure visibility between processors. However, if you're programming with WC pages then this shouldn't be a problem.
Posted by Anthony Williams
[/ threading /] permanent link
Tags: intel, x86, c++, threading, memory ordering, memory model
Digg This | Save to del.icio.us | Stumble It! | Submit to Reddit | Submit to DZone
No Comments