Page MenuHomePhabricator

Bug in hftest.py
Open, Needs TriagePublic

Description

Wanted to report to you that commit 18a25f9241f86ba2d637011ff465ce3869e8651b
in hafnium "appears" broken. The issue with the optimization in this patch is that the partition images are not reloaded for each test run, which means a previous test could have written data to say SRAM, and the following test would use the old values from the previous test, when the same image is executed again from SRAM for a following test. This would be a problem for pretty much anything in the data section of a partition. In my case, I have a counter in the data section of my partition, which does not get reset back to its original value.

I've attached a patch to help repro the issue. Fix is to disable the optimization or somehow reload the images for each run. This affects only "both world" tests.

Let me know if I'm missing something here.

Apply patch and run timeout --foreground 300s ./test/hftest/hftest.py --out_partitions out/reference/secure_aem_v8a_fvp_vm_clang --log out/reference/kokoro_log --spmc out/reference/secure_aem_v8a_fvp_clang/hafnium.bin --driver=fvp --hypervisor out/reference/aem_v8a_fvp_clang/hafnium.bin --partitions_json test/vmapi/ffa_secure_partitions/ffa_both_world_partitions_test.json

The command line is from kokoro/test_spmc.sh.

Event Timeline

- Uploaded patch for repro.

Copying response on mailing list.

Hi Raghu,

Thanks for reporting.
This part of the test infrastructure (testing the SPMC) is still very fresh and requires improvement iterations so please bear with us. Also a reason it's not yet part of the automated non-regression with jenkins (as opposed to the legacy kokoro/test.sh). For the time being we still mostly rely on the TF-A CI for testing on the secure side.

IIUC this change was made to help with the test time as the FVP takes long to reload on every test.
But indeed it might have the side effect you describe.
So either we revert the FVP reloading on every test.
Or another (somewhat hackish) possibility is to clear the mentioned variables from within the test (or make them part of BSS)?

To be fair, the both worlds test scenario is not 100% stable on my machine (for some reason the connection is not always successful between the FVP and hftest) hence limiting confidence/robustness of my testing and investigations. So I wonder is the scripting is still somewhat a bit fragile.

Regards,
Olivier.

Hello,

I found the bug causing both worlds scenario to randomly fail: is wrong handling of telnet connection with the model causing some message loss; a fix in on-going.
I'm also having a look to get the right telnet port number from the model.
Will update the ticket with the new developments.

Cheers,
Federico