Do we really require the dmb st in sprt_queue_push and sprt_queue_pop? If the queue is to be used on a single CPU, all CPU's guarantee that their own stores are visible in program order. If the queue is to be used across multiple CPUs/Threads, you will usually use a spin lock or other synchronization mechanism(if you dont, the code is broken in other ways) in which case the ordering within the critical section does not matter, since when the spin lock releases the lock using STLR, all stores before the STLR are *visible* to all CPU's before the store caused by STLR(ie before any other CPU can acquire the spin lock). Since this is the case, the dmb st really does nothing in my view. Am i missing something here ?