Page MenuHomePhabricator

Barrier in sprt_queue.c
Closed, ResolvedPublic

Description

Do we really require the dmb st in sprt_queue_push and sprt_queue_pop? If the queue is to be used on a single CPU, all CPU's guarantee that their own stores are visible in program order. If the queue is to be used across multiple CPUs/Threads, you will usually use a spin lock or other synchronization mechanism(if you dont, the code is broken in other ways) in which case the ordering within the critical section does not matter, since when the spin lock releases the lock using STLR, all stores before the STLR are *visible* to all CPU's before the store caused by STLR(ie before any other CPU can acquire the spin lock). Since this is the case, the dmb st really does nothing in my view. Am i missing something here ?
Same question with dmbish() in spm_response_add() and spm_response_get().

Event Timeline

raghuncstate triaged this task as Normal priority.May 5 2019, 9:07 PM
raghuncstate created this task.
raghuncstate updated the task description. (Show Details)May 5 2019, 9:09 PM

A secure partition may be reading from the queue from CPU1 at the same time as CPU0 pushes a new request. If this happens, it is needed that the stores are seen in the same order by all observers of the system.

Thanks. Missed the lockless reader of the queue. Who is the lockless reader for spm_response_add() and spm_response_get()?

Apologies for the delay. I've had a look into the spm_buffers.c file and I understand your query is around the dmbish() that's performed shortly before releasing the lock in both spm_response_add() and spm_response_get().

The intention of the barrier is to ensure that the is_valid flag in the sprt_response structure is only updated once the response's contents have been stored and made visisble to other CPUs. Is this necessary within the section guarded by the spinlock? I don't think so, as long as all accesses to the "responses" array are performed through these functions. That does appear to be the case here, having assessed the references to that array within the SPM code.

I'll run some tests with the dmbish() call removed to check for any immediately obvious side effects.

soby-mathew added a comment.EditedMay 21 2019, 3:23 PM

Who is the lockless reader for spm_response_add() and spm_response_get()?

The spm_response_add() will be invoked by the Secure partition when the response is available to write to the response buffer. The spm_response_get() will be invoked by SPM (running at EL3) to read the response from the buffer and pass it on to the non-secure client.

The locks themselves are only valid in the execution context they are used in. The producer and consumer of the buffer are in 2 different execution contexts (EL3 for consumer and S-EL0 for producer) and the locks do not ensure mutual exclusion between them. Hence the dmbish ensure that resp->is_valid is observed correctly by the producer and consumer. The locks ensure mutual exclusion for different contenders within the same entity (eg: multiple consumers in EL3 or multiple producers in S-EL0).

Thanks Paul, Soby.
spm_response_*() currently cannot invoked by any secure partition since the responses[] array is in EL3 space. Is this not the case ? or is it the expectation that the responses array will be mapped to secure EL0 some time in the future? I don't see how a secure partition can invoke spm_response_* other than through an SMC, in which case we are already in EL3 context and dont require the dmbish(), as Paul pointed. I understand your argument for sprt_queue_*, since they are invoked by EL3 and the secure partition.

Ah, You are right. Having taken a look at it again, yes, the SP-> SPM communication is register based and this spm_response_add() is invoked by SPM to push to a buffer within EL3 (its not a shared buffer between different ELs). I suspect the shared buffer primitives were written with shared buffer scenario in mind and the current prototype implementation does not optimize it for the case when the buffer is within EL3.

I agree that dmbish() is probably redundant in this case.

Thanks guys! The dmbish() is not a huge deal. Just get a little nervous when i see barriers and don't completely understand why it is there. :)

soby-mathew closed this task as Resolved.May 28 2019, 1:17 PM

It seems like a valid point regarding the use of dmb st in sprt_queue.c. Your analysis on the necessity of memory barriers in single CPU scenarios versus multi CPU/Thread scenarios adds depth to the discussion.
Professional Plumbing Services in Atlanta GA

I'm glad to see that CMake is addressing this common challenge for developers. Thank you for your interest in getting over it and for your ongoing efforts to simplify the construction process!