libErator Artifact

This repository contains the code for the FSE'25 paper: Liberating Libraries through Automated Fuzz Driver Generation.

Purpose: libErator automatically generates drivers (unit-tests) starting from a library source.

The whole framework is composed of three main components:

Static analyzer: it takes a library soure code an emits a list of constraints - ./condition_extractor.
Driver generator: it uses the library constraints (from the static analyzer) and synthetizes the drivers (+ seeds) - ./tool/main.py.
Fuzzing: we use slightly customized libfuzz to fuzz the new generated drivers - ./custom-libfuzzer.

Citation

Flavio Toffalini, Nicolas Badoux, Zurab Tsinadze, and Mathias Payer. 2025. Liberating Libraries through Automated Fuzz Driver Generation: Striking a Balance without Consumer Code. Proc. ACM Softw. Eng. 2, FSE, Article FSE095 (July 2025), 23 pages. https://doi.org/10.1145/3729365

@inproceedings{Toffalini_Liberating_Libraries_through_2025,
  author = {Toffalini, Flavio and Badoux, Nicolas and Zurab, Tsinadze and Payer, Mathias},
  booktitle = {Proceedings of the ACM on Software Engineering,},
  doi = {10.1145/3729365},
  month = jul,
  volume = 2,
  publisher = {ACM New York, NY, USA},
  title = {{Liberating Libraries through Automated Fuzz Driver Generation}},
  url = {https://dx.doi.org/10.14722/3729365},
  year = {2025}
}

Or the cite button on the left of GitHub interface.

Run the Artifact

We prepared a small-scale experiment to execute our framework. The instructions are collected here. Notice that the artifact only covers a short campaign lasting around 10 hours, and one single run. If you want to run the full array of experiments, you need around 10TB of space, and probably a week for running everything (considering a 32 core machine and around 30 jobs in parallel).

If you want to run the full campaign, you can follow these commands.

tmux # highly recommended
./preinstall.sh
cd fuzzing_campaigns/
./fuzzing_pipeline_generation.sh

At the end, you will have many spurious files in your home, to delete (almost) all of them, do as follows.

cd ..
./tool/misc/clean_spurious_files.sh

We tried to clean as much as possible, but our regex did not catch all :) If you have any solution, your PRs are more than welcome!

How to Install

The environment has been designed for VS Code. The Dockerfile in the root folder builds the container, that will be used as development environment in VS Code.

To install Docker extension and make your dev env:

https://code.visualstudio.com/docs/remote/containers#_quick-start-open-an-existing-folder-in-a-container

Tips to use your GH SSH from inside the Docker

https://code.visualstudio.com/docs/remote/containers#_sharing-git-credentials-with-your-container

If you do not remember how to Docker your user

From here.

TL;DR;

# if not docker group exists
$ sudo groupadd docker
$ sudo usermod -aG docker $USER
# IMPORTANT: logout/login
# IMPORTANT 2: kill/exit tmux sessions (be careful to other tmux users with open sessions)

Remember to disable multithreading

Important: do not turn multithreading on, reboot the machine instead

echo off | sudo tee /sys/devices/system/cpu/smt/control

NSF tricks

# install client
sudo apt install nfs-common
mkdir $LOCAL_FOLDER
sudo mount $SERVER_IP:$REMOTE_FOLDER $LOCAL_FOLDER

Preliminary installation

In the host, run the script:

./preinstall.sh

It will install the minimal Python packages and compile our custom libfuzzer version.

How to Integrate a new Target (a new Library)

Specific Guides

Ad-hoc guides for specific components of the project

Internal Technical Details

Driver IR

Problems Addressed and Solved

These are the challenges that we face, and the solution proposed. We further indicate if the solution requires manual intervention or is completely automatic.

Correct sequence of APIs:
- control-flow -> a valid chain of functions
- data-flow -> assign coherent arguments to each function
automatic: both solved with NDA, which guarantees the creation of a sequence of APIs whose function + arguments match the contraints of a context. The constraints are modelled as list of fields manipulated.
- source/sink identification: depends on the type-system/philosophy used in the library.
automatic: we systematize a few strategies and associate a policy for each of them. Each policy finds source/sink APIs. We then automatically identify the most suitable policy for a library.
Data constraints -> the arguments should adhere to some data constraints:
- args used as a string -> add NULL to the end
- args used as an array -> prepare an array of objects
- args used as a file -> prepare a temporary file to store information
- args modified by the API -> known it is an output
- dependencies, we know two arguments are used together (e.g., in a if-condition or in a loop). This info allows us to define few policies:
  - arg + len: one arg is an array and another arg controls its length.
  - arg addr + arg addr: the API expects two arguments that belong to the same buffer argument.
- coherent cleanup: we infer which function correctly de-allocate an object (i.e., fclose(pf)). If it is not possible, we fallback to free() if the object is allocated in heap.
- args used for malloc-length: we cap the malloc allocation to avoid OOM
automatic: For each API's argument, we infer a set of constraints, which combined with some policies, allow us to automatically prepare the driver and increase its stability.
Data initialization -> some library expects the user to initialize data before interacting with the APIs. These operations fall beyond the API code analysis.
- object chains -> the library expects the user to create a chain of objects that point to each other, without using any APIs
- callbacks -> for testing reason, the user should prepare a set of callbacks to test specific library functionalities
- var arg functions
partial-automatic: we have a set of heuristics that cover the main object initialization strategies encountered in our dataset. We currently do not cover objects that require manual initialization, e.g., fields that require to be set individually. Callbacks are automatically set through mock functions, vararg are handled through a set of pointers, and object chains are inferred through type-dependencies from the API functions combined with a so-called "field bias", more info in the paper.
Useful header -> what are the headers useful for a consumer?

manual: we require an operator to indicate the public header files of a library. Many headers are supposed to be private but they are installed as publicly accessible, thus, it is not possible to infer which is which automatically.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.devcontainer		.devcontainer
LLVM		LLVM
_docs		_docs
artifact		artifact
bugs		bugs
condition_extractor		condition_extractor
custom-libfuzzer		custom-libfuzzer
docker		docker
framework		framework
fuzzers		fuzzers
fuzzgen-targets		fuzzgen-targets
fuzzing_campaigns		fuzzing_campaigns
libplist/drivers		libplist/drivers
library_statistics_extractor		library_statistics_extractor
misc		misc
oss-fuzzer-comp		oss-fuzzer-comp
oss-llm-targets		oss-llm-targets
oss-manual-targets		oss-manual-targets
regression_tests		regression_tests
targets		targets
tests		tests
tool		tool
utopia_static_analysis		utopia_static_analysis
.dockerignore		.dockerignore
.gitignore		.gitignore
ARTIFACT.md		ARTIFACT.md
CITATION.cff		CITATION.cff
COMPETITORS.md		COMPETITORS.md
Dockerfile		Dockerfile
LICENSE.TXT		LICENSE.TXT
README.md		README.md
STATUS.md		STATUS.md
bootstrap_llvm.sh		bootstrap_llvm.sh
generate_documentation.sh		generate_documentation.sh
install_analysis_result.sh		install_analysis_result.sh
install_llvm.sh		install_llvm.sh
overwrite.toml		overwrite.toml
paper.pdf		paper.pdf
preinstall.sh		preinstall.sh
quick_start.sh		quick_start.sh
requirements.txt		requirements.txt
requirements_host.txt		requirements_host.txt
start_debugenv.sh		start_debugenv.sh
start_devenv.sh		start_devenv.sh
update-alternatives-clang.sh		update-alternatives-clang.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

libErator Artifact

Citation

Run the Artifact

How to Install

How to Integrate a new Target (a new Library)

Specific Guides

Internal Technical Details

Problems Addressed and Solved

About

Releases

Packages

Contributors 2

Languages

License

HexHive/liberator

Folders and files

Latest commit

History

Repository files navigation

libErator Artifact

Citation

Run the Artifact

How to Install

How to Integrate a new Target (a new Library)

Specific Guides

Internal Technical Details

Problems Addressed and Solved

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages