Page MenuHomePhabricator

Support SVE in non-secure world with CTX_INCLUDE_FPREGS=1
Open, Needs TriagePublic

Description

Currently we don't save and restore the full SVE registers in the context when CTX_INCLUDE_FPREGS=1 is enabled, only the NEON parts of the registers. This prevents using NEON instructions in the secure world together with SVE instructions in the non-secure world. I plan to fix this by storing the SVE registers in the context when switching between worlds.

Event Timeline

pcc created this task.Apr 26 2022, 11:09 PM

Hi Peter,
So far this has been a deliberate design choice to avoid saving/restoring SVE state from EL3 mainly for BL31 footprint reasons (and performance if unconditionally done on each and every world switch). The vector register file ranges from 2KB to 8KB with 8 cores, and linearly scales to as many cores in the system (which can be hundreds in a server chipset). Apart from specific cases under discussion (SPM-MM or EL3 FF-A SPM), it is preferable to do this at lower EL e.g. a TOS at SEL1 (or Hafnium at SEL2). Do you have specific reasons why it cannot be done at lower EL?
You may also want to take a look at those options: ENABLE_SVE_FOR_NS and ENABLE_SVE_FOR_SWD
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/docs/getting_started/build-options.rst#n409
Regards,
Olivier.

okash added a subscriber: okash.May 20 2022, 6:32 PM

Hey Olivier,

Thanks for the suggestion. As Peter pointed out, we want to use NEON instructions in SWd hence CTX_INCLUDE_FPREGS=1 which means ENABLE_SVE_FOR_NS must be zero. We understand the overhead of always saving and restoring SVE state. However Arve from Trusty team has a solution for lazy save/restore of SVE context. I don't think he's on phabricator so I'll ask him to join and explain it. The gist is to use traps on lower ELs access to SVE or NEON regs such that EL3 only takes a hit of save and restore when SVE regs or NEON regs are actually used in lower EL. Yes the memory footprint will be there but there won't be spurious saving and restoring. What do you think?

Thanks,
Okash

When you have a S-EL2 based system with possibly multiple S-EL1 partitions, it would be complex to implement a scheme where in EL3 will restore the right S-EL1 context on taking a trap during lazy save mechanism. In such systems, it is easier for S-EL2 to implement such a scheme since it is the manager for S-EL1 contexts.

Also SVE can be upto 8KB per CPU and we would need to keep both non-secure and secure versions. SO we are looking at 8KB *2 * num of CPUs for memory footprint which can be more than the available SRAM in the system. Since the TOS/SPM runs from DDR, it is less constrained in memory for keeping the SVE contexts.

DDR carveout for EL3 firmware is a solution, but for S-EL2 systems, due to the first limitation mentioned above, it is easier for SPM to manage the SVE contexts.

When you have a S-EL2 based system with possibly multiple S-EL1 partitions, it would be complex to implement a scheme where in EL3 will restore the right S-EL1 context on taking a trap during lazy save mechanism. In such systems, it is easier for S-EL2 to implement such a scheme since it is the manager for S-EL1 contexts.

S-EL2 cannot lazily save and restore the non secure register state though since it cannot trap accesses by the normal world. I have not thought much about how to optimize a lazy save and restore mechanism where a lower exception level also uses lazy save and restore, but I don't think the secure world is fundamentally different from the normal world here. You can have lazy save and restore in NS-EL1, NS-EL2, S-EL1, S-EL2 and EL3. I think it is worthwhile to see how this can be optimized to avoid saving and restoring register states that will not be used, but I would like to see a solution that does not leak data between execution environments that are supposed to be isolated.

It is also worth keeping in mind that some systems use multiple secure partitions, even without S-EL2 support in the hardware, so EL3 does not necessarily switch between only two contexts. If we want to have memory by not including the full register state for every context, it would be most flexible to have separate switches to enable fp/simd support and full-sve support for each context. Trusty always needs fp/simd support since the clang compiler uses these registers by default, but trusty does not yet need sve support (that could change as soon as it becomes available though).

Also SVE can be upto 8KB per CPU and we would need to keep both non-secure and secure versions. SO we are looking at 8KB *2 * num of CPUs for memory footprint which can be more than the available SRAM in the system. Since the TOS/SPM runs from DDR, it is less constrained in memory for keeping the SVE contexts.

DDR carveout for EL3 firmware is a solution, but for S-EL2 systems, due to the first limitation mentioned above, it is easier for SPM to manage the SVE contexts.

In the case where there is no S-EL2 we need memory to store the state needed for the ff-a shared memory apis, so a ddr carveout is likely needed anyway if el3 is running out of a limited size sram.

soby-mathew added a comment.EditedMay 24 2022, 8:39 AM

Hi Arve

S-EL2 cannot lazily save and restore the non secure register state though since it cannot trap accesses by the normal world. I have not thought much about how to optimize a lazy save and restore mechanism where a lower exception level also uses lazy save and restore, but I don't think the secure world is fundamentally different from the normal world here.

S-EL2 does not need to trap access by Normal world to do a lazy save and restore. The sequence that I have in mind would be something like below:

  1. EL3 switches to the S-EL2 on receipt of a FF-A call from Non Secure. SPM schedules the right S-EL1 partition with the SVE and FP trap enabled. Note that the NS FPU/SVE is still present in the registers at this point in time.
  2. S-EL1 SP now tries to access FP/SVE and takes a trap to SPM in S-EL2. SPM now saves the NS FPU/SVE context and restores the S-EL1 SP FP/SVE context and disables the trap. It reenters the S-EL1 partition.
  3. The partition is now able to use SVE/FP and completes its work. Returns back to SPM.
  4. SPM now saves the SP FP/SVE context and restores the NS SVE/FP context. Return back to NS caller via EL3.

As can be seen, The SPM does not need to trap NS SVE/FP accesses.

You can have lazy save and restore in NS-EL1, NS-EL2, S-EL1, S-EL2 and EL3. I think it is worthwhile to see how this can be optimized to avoid saving and restoring register states that will not be used, but I would like to see a solution that does not leak data between execution environments that are supposed to be isolated.

EL3 does not have visibility into the SP contexts when SPM is in S-EL2. Both SPM and EL3 does not use SVE for its own purpose. It becomes onerous for EL3 to keep track of the right SP context to restore on taking a trap to EL3 from S-EL1 (SPM at S-EL2). Yes, it is essential that there should not be any leakage of data between the 2 security states.

In the case where there is no S-EL2 we need memory to store the state needed for the ff-a shared memory apis, so a ddr carveout is likely needed anyway if el3 is running out of a limited size sram.

Yes, this is needed in case EL3 is directly managing the SPs. There is no option when SPMC is at EL3 and it is likely that EL3 would need to save and restore all the involved contexts (with DDR carveout for storage). But for a S-EL2 based SPM, this is something that needs to be done at S-EL2.

odeprez added a comment.EditedMay 24 2022, 8:53 AM

Hi,
Today SEL2 unconditionally saves/restores FP/SIMD/SVE NS context on any SEL2 entry/exit.
I believe it could be optimized the way Soby is describing it by bullets 1,2,3,4.
It is worth noting that when SEL2 is not present (e.g. using the EL3 FF-A SPMC and a SEL1 TOS), the same lazy NS and TA contexts save/restore mechanism can be used by a SEL1 TOS and EL3 doesn't have to bother.

In the case where there is no S-EL2 we need memory to store the state needed for the ff-a shared memory apis, so a ddr carveout is likely needed anyway if el3 is running out of a limited size sram.

Yes this exists through the EL3 SPMC for memory sharing: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/plat/arm/board/fvp/fvp_el3_spmc.c#n24

There is no option when SPMC is at EL3 and it is likely that EL3 would need to save and restore all the involved contexts

Not sure this is a hard requirement as stated above (SEL1 TOS can do lazy save/restore just like SEL2 could do).

It is also worth keeping in mind that some systems use multiple secure partitions, even without S-EL2 support in the hardware,

AFAIK this certainly exists downstream or in non-TFA implementations. None of TF-A SPM EL3 based upstream solution actually supports more than 1 partition.

Trusty always needs fp/simd support since the clang compiler uses these registers by default, but trusty does not yet need sve support

Eventually would it be safer to build Trusty kernel with -mgeneral-regs-only ?

I also wanted to point the recent SMCCCv1.3/1.4 feature addition called SVE hint. The caller emitting SMC is allowed to set a bit stating it doesn't care about saving/restoring its SVE state. I wonder if this could be leveraged as an additional optimization.

okash added a comment.Thu, Jun 9, 10:25 PM

Hey Soby and Olivier, on systems that don't support S-EL2, SPMC functionality mostly, if not all, resides in EL3 right? Going by that convention, would it make sense to have SVE save and restore in EL3? We can make it part of SPMC code in EL3. For additional space we can make use of DDR carveout as Soby mentioned above. Moreover, if we make that context save and restore part enablement configurable at compile time, then platform can choose whether they want the functionality. Would it then be acceptable?

Hi,

on systems that don't support S-EL2, SPMC functionality mostly, if not all, resides in EL3

This is an implementation choice. E.g. OP-TEE implements an S-EL1 SPMC without needing SPMC logic at EL3 (beyond the SPMD as FF-A relayer).
If you consider the EL3 FF-A SPMC just recently added, yes most of the SPMC logic resides at EL3.

Saving/restoring SVE at EL3 is possible. Question is why implementing at the highest privilege level when S-EL1 can do this operation (and reserve the necessary memory from its own space)? Note this is independent from FF-A and ties more in the SMCCC convention.

Hi Arve

S-EL2 cannot lazily save and restore the non secure register state though since it cannot trap accesses by the normal world. I have not thought much about how to optimize a lazy save and restore mechanism where a lower exception level also uses lazy save and restore, but I don't think the secure world is fundamentally different from the normal world here.

S-EL2 does not need to trap access by Normal world to do a lazy save and restore. The sequence that I have in mind would be something like below:

  1. EL3 switches to the S-EL2 on receipt of a FF-A call from Non Secure. SPM schedules the right S-EL1 partition with the SVE and FP trap enabled. Note that the NS FPU/SVE is still present in the registers at this point in time.
  2. S-EL1 SP now tries to access FP/SVE and takes a trap to SPM in S-EL2. SPM now saves the NS FPU/SVE context and restores the S-EL1 SP FP/SVE context and disables the trap. It reenters the S-EL1 partition.
  3. The partition is now able to use SVE/FP and completes its work. Returns back to SPM.
  4. SPM now saves the SP FP/SVE context and restores the NS SVE/FP context. Return back to NS caller via EL3.

As can be seen, The SPM does not need to trap NS SVE/FP accesses.

That is not what I meant. The NS SVE/FP access trap is needed to avoid saving and restoring the state when the SVE/FP registers are actively used by the secure world, but not by the normal world. In the sequence you describe the lazy save and restore is only lazy when the secure world does not use the SVE/FP registers.

You can have lazy save and restore in NS-EL1, NS-EL2, S-EL1, S-EL2 and EL3. I think it is worthwhile to see how this can be optimized to avoid saving and restoring register states that will not be used, but I would like to see a solution that does not leak data between execution environments that are supposed to be isolated.

EL3 does not have visibility into the SP contexts when SPM is in S-EL2. Both SPM and EL3 does not use SVE for its own purpose. It becomes onerous for EL3 to keep track of the right SP context to restore on taking a trap to EL3 from S-EL1 (SPM at S-EL2). Yes, it is essential that there should not be any leakage of data between the 2 security states.

I don't think EL3 needs to know the actual context, just that it can skip saving or skip restoring a context. If EL3 knows the registers for a context do not have live data it only needs to clear the registers.

In the case where there is no S-EL2 we need memory to store the state needed for the ff-a shared memory apis, so a ddr carveout is likely needed anyway if el3 is running out of a limited size sram.

Yes, this is needed in case EL3 is directly managing the SPs. There is no option when SPMC is at EL3 and it is likely that EL3 would need to save and restore all the involved contexts (with DDR carveout for storage). But for a S-EL2 based SPM, this is something that needs to be done at S-EL2.

I'm mostly concerned about the case where there is no S-EL2 at the moment, as that directly affects Trusty.

Also note that there are code paths where trusty exits with an smc call due to a crash and it might not be possible to restore the FP/SVE registers. In these cases we currently allow Linux to continue running without trusty services (as a config option). The initial startup for each cpu is another case where the normal entry and exit path is not used which makes it more difficult to make sure the registers are saved.

In T989#11709, @odeprez wrote:

...

It is also worth keeping in mind that some systems use multiple secure partitions, even without S-EL2 support in the hardware,

AFAIK this certainly exists downstream or in non-TFA implementations. None of TF-A SPM EL3 based upstream solution actually supports more than 1 partition.

Trusty always needs fp/simd support since the clang compiler uses these registers by default, but trusty does not yet need sve support

Eventually would it be safer to build Trusty kernel with -mgeneral-regs-only ?

Sorry that is not the only reason Trusty needs fp/simd support. The aes and sha256 acceleration instructions need the simd registers and these are used by BoringSSL, which we link into several trusty apps, and may also need to link into the kernel in the future.

I also wanted to point the recent SMCCCv1.3/1.4 feature addition called SVE hint. The caller emitting SMC is allowed to set a bit stating it doesn't care about saving/restoring its SVE state. I wonder if this could be leveraged as an additional optimization.

That sounds like it would be useful for the optimization discussed above.

That is not what I meant. The NS SVE/FP access trap is needed to avoid saving and restoring the state when the SVE/FP registers are actively used by the secure world.

Hmm, there seems to be a disconnect in my understanding of how you are envisaging the runtime model. When the PE is in NS world, none of the regs (FP/SVE/SME) will have secure world contents. The world switch from Secure to Non Secure is clean and complete with no latent values from Secure world being present in the regs when PE is executing in NS. In such a scenario, I don't see a situation where a PE will run in NS with active values from Secure world.

I may need some sequence diagrams to understand the flow you are describing here. If a call is easier, we can arrange one as well to understand this better.

I'm mostly concerned about the case where there is no S-EL2 at the moment, as that directly affects Trusty.

Ok, I am only focusing on cases with S-EL2 but I understand that there are subtleties in cases without S-EL2 which might require EL3 to save and restore. These can be adhoc solutions isolated to specific dispatchers (trusty dispatcher in this case) and that is fine AFAICS (Although, as Olivier mentions, there should be a strong justification to do this in EL3 which you seem to have for Trusty).

Best Regards
Soby Mathew

okash added a comment.Mon, Jun 20, 5:44 PM

Thanks. It seems like we are converging. Would it make sense to set up a meeting to thrash out details? Any time this week will be preferable. Let me know what works for you. Arve and Peter are in Pacific time zone. Rest of us are based in UK I guess?

Hi Okash,
Olivier is on holiday and once he is back next week, we can arrange something to discuss.

Best Regards
Soby Mathew