Miscellaneous flaky test fixes #5177

Manciukic · 2025-04-29T15:41:09Z

Changes

Fixes:

signal unit test that was failing due to the async signal delivery (supposedly) by adding a sleep
jailer unit test that was crashing because the node was already existing by using a tmp directory
tap offload unit test that was failing because the file read was written async by adding a sleep
balloon unit test that was failing because the stats weren't refreshed by adding a sleep to ensure they get refreshed

Reason

I've gone through the recent flaky test failures and this is a proposed fix. In most cases, the fix is to add a time interval to break an async race condition for which we have no better way to resolve.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

I have read and understand CONTRIBUTING.md.
I have run tools/devtool checkstyle to verify that the PR passes the
automated style checks.
I have described what is done in these changes, why they are needed, and
how they are solving the problem in a clear and encompassing way.
I have updated any relevant documentation (both in code and in the docs)
in the PR.
I have mentioned all user-facing changes in CHANGELOG.md.
If a specific issue led to this PR, this PR closes the issue.
When making API changes, I have followed the
Runbook for Firecracker API changes.
I have tested all new and changed functionalities in unit tests and/or
integration tests.
I have linked an issue to every new TODO.

This functionality cannot be added in rust-vmm.

codecov · 2025-04-29T15:45:00Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.93%. Comparing base (321b26a) to head (dbf8c3f).
Report is 7 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5177      +/-   ##
==========================================
- Coverage   83.07%   82.93%   -0.15%     
==========================================
  Files         250      250              
  Lines       26946    26932      -14     
==========================================
- Hits        22385    22335      -50     
- Misses       4561     4597      +36

Flag	Coverage Δ
5.10-c5n.metal	`83.37% <100.00%> (-0.22%)`	⬇️
5.10-m5n.metal	`83.36% <100.00%> (-0.22%)`	⬇️
5.10-m6a.metal	`82.58% <100.00%> (-0.23%)`	⬇️
5.10-m6g.metal	`79.19% <100.00%> (-0.23%)`	⬇️
5.10-m6i.metal	`83.36% <100.00%> (-0.22%)`	⬇️
5.10-m7a.metal-48xl	`82.57% <100.00%> (?)`
5.10-m7g.metal	`79.19% <100.00%> (-0.23%)`	⬇️
5.10-m7i.metal-24xl	`83.32% <100.00%> (?)`
5.10-m7i.metal-48xl	`83.32% <100.00%> (?)`
5.10-m8g.metal-24xl	`79.19% <100.00%> (?)`
5.10-m8g.metal-48xl	`79.19% <100.00%> (?)`
6.1-c5n.metal	`83.41% <100.00%> (-0.22%)`	⬇️
6.1-m5n.metal	`83.41% <100.00%> (-0.22%)`	⬇️
6.1-m6a.metal	`82.63% <100.00%> (-0.23%)`	⬇️
6.1-m6g.metal	`79.19% <100.00%> (-0.23%)`	⬇️
6.1-m6i.metal	`83.40% <100.00%> (-0.23%)`	⬇️
6.1-m7a.metal-48xl	`82.61% <100.00%> (?)`
6.1-m7g.metal	`79.18% <100.00%> (-0.24%)`	⬇️
6.1-m7i.metal-24xl	`83.42% <100.00%> (?)`
6.1-m7i.metal-48xl	`83.42% <100.00%> (?)`
6.1-m8g.metal-24xl	`79.19% <100.00%> (?)`
6.1-m8g.metal-48xl	`79.19% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/vmm/src/signal_handler.rs

src/jailer/src/env.rs

tests/integration_tests/functional/test_net.py

The signal handler unit test registers the signal handlers, spins up a new thread, and sends signals to itself (using kill) to check that the metrics get updated. This is a very complicated test for a unit test, and it's already covered in the integration test test_signals.py. As this test is seldomly failing in our CI, it's best to get rid of its complexity and rely on the integration tests, which are better suited for this kind of test. Signed-off-by: Riccardo Mancini <mancio@amazon.com>

We seldom have failures in the CI where the test_mknod_and_own_dev fails because a file already exists. The test is using the actual /dev to create tmp devices. As there's no reason to use the actual /dev, move it to use a random folder and clean it up after the test. Signed-off-by: Riccardo Mancini <mancio@amazon.com>

The test was reimplementing the logic for creating a temporary directory instead of using TempDir, so I've changed it to simplify it. Also, save the PathBuf object instead of the String to be able to do Path operations in a canonical way without formatting strings. Signed-off-by: Riccardo Mancini <mancio@amazon.com>

This test has been flaky for a while, where sometimes the file is empty. As we're just interested that the message got delivered, not that the file was created in a timely manner, I'm adding a small retry. Signed-off-by: Riccardo Mancini <mancio@amazon.com>

test_balloon_snapshot started being flaky after we changed the logic on how we wait for the RSS to become stable. One theory is that we are not waiting enough time for the stats to refresh, so this change adds a sleep to ensure we have waited enough for the stats to be "fresh". Failure: ``` assert 189022208 > 189022208 ``` Signed-off-by: Riccardo Mancini <mancio@amazon.com>

We found a single failure for which the steal time between snapshot and restore went slightly above 2s on an AMD instance. As the purpose of this check is to ensure the value is "sane" (iow not a completely random number), not that it's really accurate (that's a kernel problem), I'm bumping it to 10s. Signed-off-by: Riccardo Mancini <mancio@amazon.com>

An upstream patch was backported to ubuntu 24.04 6.8.0-58 kernel that makes the nx hugepages recover thread a child of the firecracker process, thus increasing process count to 7. As we're not really interested in knowing how many threads we have in this test, let's remove the assertion altogether. Signed-off-by: Riccardo Mancini <mancio@amazon.com>

Manciukic force-pushed the test-fixes branch from 77ea57b to bba4dc8 Compare April 29, 2025 15:44

Manciukic force-pushed the test-fixes branch from bba4dc8 to f61e64c Compare April 30, 2025 09:59

Manciukic marked this pull request as ready for review April 30, 2025 15:26

Manciukic added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Apr 30, 2025

roypat reviewed Apr 30, 2025

View reviewed changes

src/vmm/src/signal_handler.rs Outdated Show resolved Hide resolved

src/jailer/src/env.rs Outdated Show resolved Hide resolved

tests/integration_tests/functional/test_net.py Outdated Show resolved Hide resolved

Manciukic added 7 commits May 1, 2025 16:20

Manciukic force-pushed the test-fixes branch from 38617aa to dbf8c3f Compare May 1, 2025 16:48

pb8o approved these changes May 2, 2025

View reviewed changes

roypat approved these changes May 2, 2025

View reviewed changes

roypat merged commit dafee92 into firecracker-microvm:main May 2, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miscellaneous flaky test fixes #5177

Miscellaneous flaky test fixes #5177

Manciukic commented Apr 29, 2025

codecov bot commented Apr 29, 2025 •

edited

Loading

Miscellaneous flaky test fixes #5177

Miscellaneous flaky test fixes #5177

Conversation

Manciukic commented Apr 29, 2025

Changes

Reason

License Acceptance

PR Checklist

codecov bot commented Apr 29, 2025 • edited Loading

Codecov Report

codecov bot commented Apr 29, 2025 •

edited

Loading