[Open Source Promotion Plan] Learning About libFuzzer
This article was first published in the openEuler community Open Source Promotion Plan.
Project Name: No. 112 Improving QEMU Fuzzing
About This Document
libFuzzer is an in-process
, coverage-guided
, and evolutionary
fuzzing engine that is a part of the LLVM project. It provides the fuzzing input for the to-be-tested library and related functions through a specific entry point. During the test, libFuzzer constantly mutates the input and measures code coverage and crashes.
Using libFuzzer
Experiment Environment
The cloud host of the Peng Cheng Laboratory running openEuler is used.
[root@host-10-0-0-94 libFuzzer]# lscpu
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 4
[root@host-10-0-0-94 libFuzzer]# cat /etc/os-release
NAME="openEuler"
VERSION="20.03 (LTS-SP1)"
Using libFuzzer in Simple Mode
Install LLVM and Clang.
- Source code build: The requirements on the host are strict (8 GB memory and 15 GB to 20 GB drives), and the build commands need to be optimized. In addition, install
compile-rt
on which libFuzzer depends.
bashgit clone https://gitee.com/mirrors/LLVM.git cd LLVM ; mkdir build ; cd build cmake -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;compiler-rt" -DCMAKE_BUILD_TYPE="Release" -DLLVM_TARGETS_TO_BUILD="host" -G "Unix Makefiles" ../llvm make -j4
- Binary installation: Download the binary files of various versions to facilitate version switching. Add soft links to environment variables for ease of use.
- Package manager installation: The package manager version is old and contains libFuzzer.
bash# Run the sudo apt/dnf search xxx command to view the software contained in the package manager and the version. sudo apt/dnf install clang llvm compiler-rt
- Source code build: The requirements on the host are strict (8 GB memory and 15 GB to 20 GB drives), and the build commands need to be optimized. In addition, install
Build the binary file to be tested and add the libFuzzer build option.
cpp// The function interface provided by libFuzzer is implemented in the source code fuzz_me.cc to be tested. extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) { DoSomethingInterestingWithMyAPI(Data, Size); return 0; // Non-zero return values are reserved for future use. }
bash# Clang searches for the static link libraries of libclang_rt.xxx.a, that is, sanitizers. # In addition to ASAN, other sanitizers such as UBSAN and TSAN can be added. clang -fsantize=address,fuzzer -g fuzz_me.c -o fuzz_me
If the build command fails to be executed, the installation in the previous step fails. In this case, check the location of the compile-rt library or reinstall the library.
- Run the binary file and check the output.bash
./fuzz_me grep ERROR ./*.log | sort -k 3
If a sanitizer detects an abnormal behavior of the program, fuzzing output (a crash) is generated. In addition to the crash, the fuzzer also records parameters during fuzzing, such as code coverage (in the unit of basic block) and seed mutation.
- Reproduce the input to locate the vulnerability.bash
./fuzz_me crash-xxx ./fuzz_me -seed=xxx # xxx is the SHA-1 hash value of the crash. gdb fuzz_me
The sanitizer provides the vulnerability type and the environment where the vulnerability is triggered. Check the crash details, then reproduce the vulnerability, or use the GDB for debugging.
Helpful Utilities
For large-scale software, some build options need to be enabled to improve fuzzing efficiency.
-jobs
: number of jobs. Each job triggers a crash. Jobs run in the worker processes. One worker can manage multiple jobs. If -jobs is set to 1000, simple bugs of the program can be bypassed.-workers
: number of processes. A maximum of half of the CPU cores can be used.-forks
: Replace-jobs = N
and-workers = N
with-fork = N
.-dict
: dictionary, which is necessary for fuzzing files in a specific format.CORPUS
: corpus, which is used to save the input for triggering new paths in fuzzing.-max-len
: maximum input length, which is defined based on the size of the corpus file.-run
: Reduce the crash producer. The unit is the number of mutations in each iteration.-shrink
: Reduce the corpus size to improve code coverage.
Project Practice
- Course: The
libFuzzer workshop
includes common methods for using libFuzzer. - Open source project CVE practice: libFuzzer has discovered many vulnerabilities in open source software, including the Heartbleed vulnerability in OpenSSL. Find a piece of open source software with vulnerabilities in Google
fuzzer-test-suite
and reproduce the vulnerabilities using libFuzzer. - Comparison with other tools: see Related links in the libFuzzer tutorial.
libFuzzer Principles
Mutation Algorithms
Mutation is a key step in modern fuzzers because it generates new inputs that can cover more basic blocks. libFuzzer contains a series of built-in simple mutation algorithms, most of which are bit-level inversion. It also supports user-defined mutation algorithms for targeted fuzzing.
Existing Mutation Algorithms
By observing the Stderr Output
of libFuzzer, you can find the mutation algorithms used by the current input in the MS field. See the following figure:
libFuzzer has 12 built-in mutation algorithms, which are member functions of the MutationDispatcher
class. The class definition code is as follows:
// Code path: LLVM/compiler-rt/lib/fuzzer/FuzzerMutate.cpp
MutationDispatcher::MutationDispatcher(Random &Rand, const FuzzingOptions &Options) : Rand(Rand), Options(Options) {
DefaultMutators.insert(
DefaultMutators.begin(),
{
{&MutationDispatcher::Mutate_EraseBytes, "EraseBytes"},
{&MutationDispatcher::Mutate_InsertByte, "InsertByte"},
{&MutationDispatcher::Mutate_InsertRepeatedBytes, "InsertRepeatedBytes"},
{&MutationDispatcher::Mutate_ChangeByte, "ChangeByte"},
{&MutationDispatcher::Mutate_ChangeBit, "ChangeBit"},
{&MutationDispatcher::Mutate_ShuffleBytes, "ShuffleBytes"},
{&MutationDispatcher::Mutate_ChangeASCIIInteger, "ChangeASCIIInt"},
{&MutationDispatcher::Mutate_ChangeBinaryInteger, "ChangeBinInt"},
{&MutationDispatcher::Mutate_CopyPart, "CopyPart"},
{&MutationDispatcher::Mutate_CrossOver, "CrossOver"},
{&MutationDispatcher::Mutate_AddWordFromManualDictionary, "ManualDict"},
{&MutationDispatcher::Mutate_AddWordFromPersistentAutoDictionary, "PersAutoDict"},
});
// Implementation of the preceding functions
}
The names of most mutation algorithms reflect their implementation methods. For example, EraseBytes
calls the memmove
function to overwrite some bits, and InsertBytes
calls the memmove
function to add a bit. It should be noted that in these built-in mutation algorithms, the mutation point and the mutation value are generated using a random function of the Rand
series.
Adding a Mutation Algorithm
libFuzzer and AFL are coverage-guided
fuzzing tools. When fuzzing specific objects, they may be filtered out at the early stage of program running because the mutation algorithms do not contain semantic information. Compared with generation-based fuzzing tools, libFuzzer and AFL are inefficient. Therefore, Google proposed structure-aware fuzzing
, a libFuzzer plugin
that allows users to add mutation algorithms. This section describes how to add a plugin and lists some officially implemented plugins.
The following code shows how to add a plugin. Implement a user-defined LLVMFuzzerCustomMutator
function, add a specific mutation algorithm, and call LLVMFuzzerMutate
in the function to implement common mutation.
When implementing code, use the conditional compilation instructions ifdef CUSTOM_MUTATOR
and clang -DCUSTOM_MUTATOR
to enable or disable the user-defined plugin.
// Optional user-provided custom mutator.
// Mutates raw data in [Data, Data+Size) inplace.
// Returns the new size, which is not greater than MaxSize.
// Given the same Seed produces the same mutation.
size_t LLVMFuzzerCustomMutator(uint8_t *Data, size_t Size, size_t MaxSize, unsigned int Seed);
// libFuzzer-provided function to be used inside LLVMFuzzerCustomMutator.
// Mutates raw data in [Data, Data+Size) inplace.
// Returns the new size, which is not greater than MaxSize.
size_t LLVMFuzzerMutate(uint8_t *Data, size_t Size, size_t MaxSize);
To add your own plugins, refer to the related links or the topic "Structure-aware fuzzing for Clang and LLVM with libprotobuf-mutator"
at the 2017 LLVM Developers' Meeting.
Coverage Statistics
Analyze code coverage from the following two perspectives:
- Statistical precision: From rough to precise, code coverage are classified into three levels:
Function level
: Collects statistics on called functions. Statistics on the internal code of the functions are ignored.Basic block level
: Collects statistics on basic blocks, which can be queried in the cov field ofStderr Output
in libFuzzer.Edge level
: Collects statistics on not only basic blocks, but also virtual blocks created between basic blocks and the execution information.
- Analysis object: The basic statistics method is instrumentation by adding count variables. There are three levels:
Source code
: Provides coverage statistics modes in build options.Intermediate representation
: Specifies statistics collection mode, for example,llvm pass
.Binary
: Use binary instrumentation tools such asPin
andDynamoRIO
to collect statistics on hooks.
In short, libFuzzer uses SanitizerCoverage
in the LLVM framework to collect source code–level coverage statistics. You can run the following command to specify the level. By default, the edge
level is used.
# xxx=edge,bb,func,trace-pc-guard,inline-8bit-counters,inline-bool-flag,pc-table,trace-pc
clang -fsanitize-coverage=xxx fuzz_me.c
In addition, you can develop analysis tools using SanitizerCoverage
. The sanitizer provides a coverage callback interface that allows you to dump the coverage statistics result to a .sancov
file when the fuzzing process is stopped. The LLVM framework provides the Sancov Tool
to generate source code–level coverage reports.
Error Check
Common Errors
As mentioned, the mission of each job
in the libFuzzer process is to complete the check task. It does not stop until a crash occurs or it times out. In this case, the libFuzzer daemon process captures the error code. If the error code is 77, a timeout occurs (the default timeout interval is 1,200 seconds) or the libFuzzer program is abnormal. If the error code is crash, the result and the input that causes the crash are recorded.
Sanitizers
Crash detection does not cover all fuzzing error check scenarios. For example, memory leakage and data races may not cause crashes, but they are serious errors. In such cases, memory detection tools such as Valgrind
can be used. libFuzzer uses a series of sanitizers
in the LLVM framework. These tools are provided by Google and can be used to detect runtime exceptions of C/C++. Common sanitizers are as follows:
AddressSanitizer (ASAN)
: Captures stack overflow and UAF vulnerabilities.ThreadSanitizer (TSAN)
: Captures data races and supports C/C++ and Go.UndefinedBehaviorSanitizer (UBSAN)
: Captures abnormal behaviors such as integer overflow and null pointer dereference.
ASAN is used as an example to analyze the principles of these error check tools. For details, see USENIX ATC 2021
.
- During compilation, ASAN performs instrumentation before and after LLVM IR-level memory access operations (load, store, and alloca). Due to the 8-byte alignment requirement of memory, some memory is in unused status. Set it to
shadow memory
in memory mapping mode to show the read and write status. - Hook the malloc function during running and set the
Redzone
area before and after the function, which is similar to the stack canary method. Set the shadow memory in the Redzone to unwritable to avoid overflow. - Perform hook operations on the free function during running. Instead of releasing memory immediately, set the shadow memory to a negative value. That is, the shadow memory cannot be read or written, and is placed in the isolation area for observation. If UAF or wild pointer dereference occurs, it will be captured.
Other Fuzzers
Academia: As a hot academic topic in recent years, fuzzing-related papers were presented at top international conferences for fields such as system security, network security, software analysis, and programming language. For details, see the
FuzzingPaper
project. Most of these papers propose a fuzzer for a specific object (software, hardware, OS kernel, programming language, etc.) or a specific vulnerability type (race condition, buffer overflow, etc.), and use methods such as concolic fuzzing and deep learning to improve fuzzing efficiency.Industry: Many tools are developed on GitHub based on AFL and libFuzzer, and subsequently many companies developing fuzzing technologies have emerged. Google developed a series of fuzzing tools and the
FuzzBench
platform to evaluate fuzzer performance in a unified manner.
In short, the research on fuzzing in academia and industry is closely related. Most fuzzing technologies are implemented based on the LLVM framework and are highly scalable. The academic research is conducted based on the existing tools in the industry, and the achievements with good performance are released on GitHub. The following lists some fuzzers. For more fuzzers, visit Awesome-Fuzzing
.
Generic fuzzers
libFuzzer-gv
: enhanced edition of libFuzzerAFL++
: enhanced edition of AFLOSS-Fuzz
+ ClusterFuzz: large-scale distributed fuzzer implemented through cooperation between the frontend and backendboofuzz
: enhanced edition of the Sulley frameworkphuzzer
: Python framework for interacting with AFL
Safety
Honggfuzz
: fuzzer targeting at software security vulnerabilities
Network protocol
AFLNet
: gray-box fuzzer for network protocols
Kernel
Syzkaller
: unsupervised Linux kernel fuzzer
Programming language
- Fuzzers implemented using Rust: AFL, libFuzzer, and Honggfuzz
PolyGlot
: fuzzer for programming language interpreters
IoT
FirmAFL
: gray-box fuzzer for IoT firmwareDIANE
: fuzzer for IoT applications on mobile phonesFrankenstein
: fuzzer for wireless IoT devices
If there are any errors, please contact cascades-sjtu.