A memory barrier is a type of instruction that is given to Compiler or CPU to ensure that all the instructions before the barrier instruction should occur before and all instructions after the barrier should occur after.
Modern compilers and CPUs may execute instructions in any order for the purpose of optimizations provided it does not break the apparent operations of the program. But this can cause race conditions or unexpected behaviors in multi-threaded programs. so, to prevent this unexpected behavior, memory barriers are used.
Let’s first understand Compiler reordering and CPU out of order execution:
but doesn't the compiler execute code line by line, and if so, then why do we need a memory barrier?
Actually, Compiler does not always execute the code line by line. It sometimes for the sake of optimizations reorder some instructions which might seem to be independent.
Take a look at this code for example:
#include <atomic>
volatile int threadToBeExecuted;
int commonVariable;
int computeCommonVariable();
void spinCall(int currentThread)
{
while(currentThread != threadToBeExecuted);
commonVariable = computeCommonVariable();
threadToBeExecuted = currentThread + 1;
}
Now this code looks simple spinlock implementation but if we compile it with optimizations:
g++ -S -O2 spin_lock.cpp -o spin_lock_optimized.s
The assembly code generated looks something like this:
.L2:
movl threadToBeExecuted(%rip), %eax
cmpl %ebx, %eax
jne .L2
call _Z21computeCommonVariablev@PLT
addl $1, %ebx
**movl %ebx, threadToBeExecuted(%rip)**
popq %rbx
.cfi_def_cfa_offset 8
movl %eax, commonVariable(%rip)
ret
.cfi_endproc
Now, in this code, you can see, that some instructions are moved. Which for the first time might look good that the compiler has made some optimizations, but the thing is, the instructions that seems to be independent of each other were not actually independent, as now we are updating the commonVariable after incrementing threadToBeExecuted. Which might cause an issue, as there might be chances that 2nd thread will modify commonVariable before thread 1.
Which actually should have looked like this:
.L2:
movl threadToBeExecuted(%rip), %eax
cmpl %eax, -4(%rbp)
setne %al
testb %al, %al
jne .L2
call _Z21computeCommonVariablev@PLT
movl %eax, commonVariable(%rip)
movl -4(%rbp), %eax
addl $1, %eax
movl %eax, threadToBeExecuted(%rip)
nop
leave
.cfi_def_cfa 7, 8
ret
Similar to Compiler optimizations, there are some optimizations that are made by CPU which can cause unexpected behavior in multi-threaded environment.
For Example: