Page MenuHomePhabricator

Supporting Non-Secure RTOS applications, integrated with Trusted Firmware-M, which want to use the FPU
Open, HighPublic

Description

Brief description:

We cannot fully (and easily) utilize the cortex-m Floating point features in Zephyr RTOS applications, which are running together with TF-M.

Detailed Description of the case:

In Zephyr multi-thread environment we normally enable the FP co-processor and the advanced context-control features in Cortex-M

  • Automatic state preservation
  • Lazy stacking

Zephyr threads are all free to use the floating point registers; the automatic state preservation ensures the caller-saved registers are preserved in the thread's stack and the regular context-switch routines ensure the FP callee-saved registers are also preserved in the thread's dedicated container for callee-saved context. This has been fairly stable on non TrustZone-M enabled devices.

The situation, however. becomes challenging when Zephyr applications are running at Non-Secure domain, with TF-M running in Secure domain:

The interesting use case is when Zephyr Non-Secure threads with active FP context (CONTROL.FPCA =1) are doing secure calls to the TF-M services. In such a scenario, it may occur that the TF-M secure threads will need to preserve the FP context themselves, during a Non-Secure interrupt that attempts to access the FP registers. Normally Zephyr HW interrupts do not access the FP registers, but this is not guaranteed. And, of course, the PendSV interrupt, which handles the thread context-switch does access the FP registers, because it may need to save and restore the FP callee-saved registers.

Currently this scenario will immediately trigger a TF-M system crash, via a Secure NOCP UsageFault (no co-processor), as TF-M does not enable the FP co-processor. This is noticed frequently in Zephyr applications with vanilla TF-M versions.

I suppose that this problem could be solved by enabling the FP co-processor in TF-M unconditionally (currently, this is under FPU_USED pre-processor macro). Alternatively, there needs to be a way for TF-M to know that the Non-Secure application is build with Floating-Point support, and enable the FP co-processor in SCB->CPACR_S register.

However, even if we solve this problem, the TF-M would still need to properly preserve the FP caller-saved context, when switching between threads in Secure PendSV, so Non-Secure interrupts (and potential Non-Secure reschedule actions) would not interfere with FP state preservation. What we see today is that the Non-Secure FP context is not preserved during TF-M thread switches, leading to weird situations such as when the lazy state preservation active bit is set in Secure thread mode; which should not be the case.

The reason this occurs is because the TF-M context switch routine simply loads the stacked EXC_RETURN value of the thread to be switched-in, potentially leaving the FPCCR.LSPACT bit set in Secure Thread mode (!!), and the caller-saved registers unstacked.

This will lead to system crashes, if a preempting Non-Secure ISR attempts to use the floating point registers, because the FPCAR register is most likely containing trash information, pointing to an area not reserved for FP context preservation.

It is not enough to clear the FPPCR.LSPACT bit in Secure PendSV; this would mean that an Non-Secure ISR/Thread could eventually corrupt a previous FP context. It seems to me we need to store the FP context in PendSV, before switching-out a thread with an active FP context (EXC_RETURN.FType = 0).

Assessment

This is a serious problem IMHO. It rules out the possibility to use the Cortex-M FP-related optimization (automatic and lazy state preservation) and forces Zephyr to manually save and restore the FP context before and after a secure call, respectively.

Let me know if you need more information.

/Ioannis

Event Timeline

ioannisg created this task.Tue, Apr 27, 9:12 AM
davidwang added a subscriber: davidwang.

hi, Ioannisg

For the concern you mentioned, yes, we need to add extra steps in secure scheduler, I am still working on this part.

  1. When secure doing secure calls: a. if lazy fp is disabled, hardware will push/pop FP context automatically during exception entry/return. b. if lazy fp is enable, for isolation 1, secure scheduler will save and restore FP context, but not invalidate FP context; for isolation 2 and 3, secure scheduler will trigger lazy fp stacking, hardware will push FP context to thread' stack and invalidate them automatically.
  2. When non-secure doing secure calls, non-secure side will SG to secure world in tfm_nspm_thread_entry, then doing secure calls as same as above. FP context of non-secure can be restored when bxns lr to non-secure side.

For your crash problem, if only enabled FP (hardware) on NS side, but not enable FP (hardware) on S side, there should be no crash even non-secure doing secure calls (Currently TF-M only support software FP by default).
As you mentioned crash happens for Secure NOCP UsageFault (no co-processor), I think you should make some change on TF-M side to enable FP support, right?
Could you let me know what's the exactly change you made on TF-M? These information will be helpful for investigation to see whether we can find a way to mitigate the problem.
1.Which cortex-m core?
2.TF-M changeset you are using and compiler version.
3.Changes you did for TF-M while integrated to your project.
4.More information about the crash:

a. IPC mode or library mode? Which isolation level?
b. Detail sequences of actions between NS and S when crash.
c. Did you use FP in SPM(secure partition manager) or use FP in secure partitions? Or both? 
d. Crash only for Lzay FP enabled? How about the status if Lazy FP disabled?
e. Which PSA calls causes the crash, or all PSA calls? 
f. How about the occurrence of crash? Always crash or sometimes?
g. Which fault entered? Value of registers and stack frame in memory at crash time are very useful for analysis.
h. Other information if possible.

Thank you!

Hi Feder,

For your crash problem, if only enabled FP (hardware) on NS side, but not enable FP (hardware) on S side, there should be no crash even non-secure doing secure calls (Currently TF-M only support software FP by default).

Right, when only Non-Secure application is using the FPU, there should be no issues.

As you mentioned crash happens for Secure NOCP UsageFault (no co-processor), I think you should make some change on TF-M side to enable FP support, right?

The required change is that TF-M enables FP usage as well, by writing to CPACR_S register. There are several options here

  • enable FPU unconditionally
  • enable FPU based on a user configuration that signifies that the NS Application uses the FP registers _and_ is able to switch to a secure call with CONTROL.FPCA set to 1 (otherwise, there is no problem, actually)

Could you let me know what's the exactly change you made on TF-M? These information will be helpful for investigation to see whether we can find a way to mitigate the problem.

I did not do any changes to TF-M. All I did is to enable and use the FP registers in the Non-Secure application, enabling, also automatic state preservation and lazy stacking (ASPEN, LSPEN). I also allowed Non-Secure threads with active FP context to call into PSA calls.

In detail:

1.Which cortex-m core?

M33

TF-M changeset you are using and compiler version.

Using TF-M 1.3.0

Detailed sequence of actions between NS and S when crash.

a. Boot TF-M
b. Boot Non-Secure application. Set CPACR (allow FP usage), Set FPCCR .ASPEN and .LSPEN to 1
c. Switch to a non-secure thread that uses FP instructions. CONTROL.FPCA will be set to 1, indicating an active FP context.
d. Do a PSA call
e. While inside the PSA call, trigger a Non-Secure interrupt that makes use of FP instructions (I am using Non-Secure PendSV)
f. crash with Secure UsgFault, NOCP

Changes you did for TF-M while integrated to your project.

None

IPC mode or library mode? Which isolation level?

Does not matter

Did you use FP in SPM(secure partition manager) or use FP in secure partitions? Or both?

No

Which PSA calls causes the crash, or all PSA calls?

Does not matter. But it has to be a call that gets interrupt by non-secure application

How about the occurrence of crash? Always crash or sometimes?

Always

Which fault entered? Value of registers and stack frame in memory at crash time are very useful for analysis.

Secure No-Coprocessor UsageFault error

Crash only for Lzay FP enabled? How about the status if Lazy FP disabled?

I only try with LSPEN set to 1. Without LSPEN, I guess that the stacking will occur during the Non-Secure exception entry, and this will end up with the same UsgFault

Once more, I am stressing that it does not seem to be enough to enable the CPACR in Secure domain. We need to actually save the FP registers in the secure thread context-switch, if the thread is actually switched-out.

ioannisg triaged this task as High priority.Mon, May 3, 8:21 AM

Setting this to High for now - but feel free to re-triage this was not appropriate.

Hi @ioannisg,
FYI. Feder is on holiday and will back to office on 10th May.
Thanks.

hi, ioannisg

If you didn't change TF-M while integrating into your project, PSA call(handler mode) cannot be interrupted by non-secure interrupt like you mentioned, the reason is non-secure exceptions are de-prioritized (AIRCR.PRIS = 1) in TF-M.
Non-secure interrupt can only be active when system in thread mode.

Based on your answer:

  1. You didn't "use FP in SPM(secure partition manager) or use FP in secure partitions".
  2. You didn't change TF-M while integrating into your project.
  3. Crash only happens in non-Secure interrupt that makes use of FP instructions.

Your application should be a non-Secure application using FPU, there should be no issues.

Besides the FP registers configured you mentioned for non-secure application, did you add compiler and linker flag accordingly for your non-secure interrupt source file, such as -mfloat-abi=hard -mfpu=fpv5-sp-d16 (gcc compiler)?

Hi Feder,

  1. I have not changed TF-M at all, while integrating to Zephyr. am using an upstream TF-M version.
  2. There is no problem with the linker, and ABIs; I compile zephyr and Tf-M with soft FP.

If you didn't change TF-M while integrating into your project, PSA call(handler mode) cannot be interrupted by non-secure interrupt like you mentioned, the reason is non-secure exceptions are de-prioritized (AIRCR.PRIS = 1) in TF-M.

I did not intend to mention that Secure handler mode is interrupted.

Non-secure interrupt can only be active when system in thread mode.

Exactly. This is my use case. Let me explain this in a bit more detail, to avoid confusion.

Steps:
a. Boot TF-M
b. Boot Non-Secure application. Zephyr in my use-case. Inside Zephyr boot phase, I set CPACR (to allow FP usage). I also set FPCCR .ASPEN and .LSPEN to 1. These bits enable FP automatic state preservation and lazy stacking, respectively.
c. Inside my Non-Secure application, I switch to a non-secure thread that uses FP instructions. This means that CONTROL.FPCA will be set to 1, indicating an active FP context.
d. Then, I am calling a secure service from my non-secure thread application. E.g. I am calling psa_hash_compute()
e. While still doing the above secure service, so, before the psa_hash_compute function returns, a Non-Secure interrupt is triggered and becomes active. This is allowed, because the background state is Secure Thread mode.
f. Inside the non-secure interrupt I make use of FP instructions.
g. Crash with Secure UsgFault, NOCP is observed.

Could you look at this use case and see if you also get the same processor error?

If you don't see any issues, could you suggest an explanation why I am observing a Secure UsageFault with NOCP flag, while i am only using an upstream TFM version?

Thanks!

federliangarm added a comment.EditedThu, May 13, 1:35 AM

Hi, Ioannisg

The first thing I want to confirm is "I compile zephyr and Tf-M with soft FP". As you know, TF-M is default with soft FP.

  1. But for Zephyr, are you using soft FP or hardware FP?
  2. Is it possible for you to share the compile options and linker options for the source file including the "Non-Secure interrupt" crashing?
  3. Is it possible to show the assembly code for the "Non-Secure interrupt"?
  4. Please have a try to disable lazy FP stacking, to see whether still crash.

Hi,

But for Zephyr, are you using soft FP or hardware FP?

I am using soft FP for Zephyr. You can see the setting here:
https://github.com/zephyrproject-rtos/zephyr/blob/f8ac3a49ec95f4bf98bbd1f2b827c421f654d48b/arch/arm/core/aarch32/Kconfig#L241

Hard ABI is disabled when building with TF-M. Otherwise we cannot link TF-M with Zephyr libraries.

Is it possible for you to share the compile options and linker options for the source file including the "Non-Secure interrupt" crashing?

You can browse all the settings here: https://github.com/zephyrproject-rtos/zephyr/tree/master/cmake
And the gcc-specific settings here: https://github.com/zephyrproject-rtos/zephyr/blob/master/cmake/compiler/gcc/target_arm.cmake

Is it possible to show the assembly code for the "Non-Secure interrupt"?

Yeap. The Non-Secure interrupt is the PendSV handler in Zephyr:

https://github.com/zephyrproject-rtos/zephyr/blob/f8ac3a49ec95f4bf98bbd1f2b827c421f654d48b/arch/arm/core/aarch32/swap_helper.S#L112

The link shows the line that triggers the fault.

Please have a try to disable lazy FP stacking, to see whether still crash.

Will do, and let you know.

Yes, it crashes even without Lazy Stacking. It is a bit more deterministic as is. It crashes in the first secure exception entry, after the transition to secure domain from non-secure.

hi, Ioannisg

This is the explanation for "FP_SOFTABI" in Zephyr.
"config FP_SOFTABI
bool "Floating point Soft ABI"
help

	  This option selects the Floating point ABI in which hardware floating
	  point instructions are generated but soft-float calling conventions."

The definition of FP_SOFTABI in Zephyr is not same as the default "software FP" option in TF-M.

For GCC compiler, please check here https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html.
"-mfloat-abi=name
Specifies which floating-point ABI to use. Permissible values are: ‘soft’, ‘softfp’ and ‘hard’.
Specifying ‘soft’ causes GCC to generate output containing library calls for floating-point operations. ‘softfp’ allows the generation of code using hardware floating-point instructions, but still uses the soft-float calling conventions. ‘hard’ allows generation of floating-point instructions and uses FPU-specific calling conventions."

TF-M currently support "-mfloat-abi=soft" as default, but doesn't support the setting in Zephyr "-mfloat-abi=softfp". Because they are totally different things.

For"-mfloat-abi=softfp", compiler generates hardware floating-point instructions but still uses the software float calling conventions.
TF-M doesn't support FP instructions with "-mfloat-abi=soft" option. This should be the reason why you got the crash when doing secure call.
For confirmation, please double check whether there is FP instructions in the assemble code of the secure exception .

ioannisg added a comment.EditedThu, May 13, 5:56 PM

TF-M currently support "-mfloat-abi=soft" as default, but doesn't support the setting in Zephyr "-mfloat-abi=softfp". Because they are totally different things.

But why? These ABIs are link-compatible, I do not see the problem. GCC allows these binaries to link successfully.

Can you please point me to the TF-M documentation, where it is stated that "TF-M does not support compiling Non-Secure application binaries with soft FP"?

I could not find any relevant information (but may have missed that). I could only find:

if(TFM_SYSTEM_FP)
    message(FATAL_ERROR "Hardware FPU is currently not supported in TF-M")
endif()

This suggests that TF-M does not support Hardware FPU, but I do not think this means TF-M does not allow the non-secure applications to use the FPU.
TF-M actually implies that Non-Secure applications are allowed to use the FPU:

See the code snippet below [in tfm_arch_config_extensions()]:

#if defined(__ARM_ARCH_8_1M_MAIN__) || defined(__ARM_ARCH_8M_MAIN__)
    /* Permit Non-secure access to the Floating-point Extension.
     * Note: It is still necessary to set CPACR_NS to enable the FP Extension in
     * the NSPE. This configuration is left to NS privileged software.
     */
    SCB->NSACR |= SCB_NSACR_CP10_Msk | SCB_NSACR_CP11_Msk;
#endif

So, according to my understanding, TF-M currently intends to allow Non-Secure applications to make use of the FPU. And of course, such applications need need to compile with softfp, to do that. So the use-case I have presented to you is a valid use case.

If TF-M does not allow non-secure applications to use the FPU at all, that should be stated in the documentation and the tfm_arch_config_extensions() should not touch the NSACR register. But, honestly, that would be a severe limitation to Non-Secure applications on SoCs that have an FPU co-processor. And I do not believe this could have been the original intention.

TF-M doesn't support FP instructions with "-mfloat-abi=soft" option. This should be the reason why you got the crash when doing secure call.

I have been trying to explain to you that I compile TF-M with soft, I do not change the default settings. And I compile Zephyr with softfp, and as expected, I do not get any link errors. Maybe you are confused that if I use softfp for Zephyr, then this is also used in the TF-M compilation? I can confirm you that the TF-M binary is still compiled with soft, as is the default configuration :)

Therefore, the TF-M code does not use FPU instructions at all. The reason for the crash is that Cortex-M tries to stack FP registers because the LSPACT flag is left set, by the core, while in secure thread mode.

For confirmation, please double check whether there is FP instructions in the assemble code of the secure exception.

I confirm that the assembly code of any secure exceptions do not have any FP instructions. There are no FP instructions at all in the TF-M binary.