[Open Source Promotion Plan] Learning About libFuzzer

cascades2021-08-14FuzzingPeng Cheng Laboratorysummer2021

This article was first published in the openEuler community Open Source Promotion Plan.
Project Name: No. 112 Improving QEMU Fuzzing

About This Document

libFuzzer is an in-process, coverage-guided, and evolutionary fuzzing engine that is a part of the LLVM project. It provides the fuzzing input for the to-be-tested library and related functions through a specific entry point. During the test, libFuzzer constantly mutates the input and measures code coverage and crashes.

Using libFuzzer

Experiment Environment

The cloud host of the Peng Cheng Laboratory running openEuler is used.

bash
[root@host-10-0-0-94 libFuzzer]# lscpu
Architecture:                    aarch64
CPU op-mode(s):                  64-bit
Byte Order:                      Little Endian
CPU(s):                          4

[root@host-10-0-0-94 libFuzzer]# cat /etc/os-release
NAME="openEuler"
VERSION="20.03 (LTS-SP1)"

Using libFuzzer in Simple Mode

libFuzzer tutorial

  1. Install LLVM and Clang.

    • Source code build: The requirements on the host are strict (8 GB memory and 15 GB to 20 GB drives), and the build commands need to be optimized. In addition, install compile-rt on which libFuzzer depends.
    bash
    git clone https://gitee.com/mirrors/LLVM.git
    cd LLVM ; mkdir build ; cd build
    cmake -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;compiler-rt" -DCMAKE_BUILD_TYPE="Release" -DLLVM_TARGETS_TO_BUILD="host" -G "Unix Makefiles" ../llvm
    make -j4
    • Binary installation: Download the binary files of various versions to facilitate version switching. Add soft links to environment variables for ease of use.
    • Package manager installation: The package manager version is old and contains libFuzzer.
    bash
    # Run the sudo apt/dnf search xxx command to view the software contained in the package manager and the version.
    sudo apt/dnf install clang llvm compiler-rt
  2. Build the binary file to be tested and add the libFuzzer build option.

    cpp
    // The function interface provided by libFuzzer is implemented in the source code fuzz_me.cc to be tested.
    extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
      DoSomethingInterestingWithMyAPI(Data, Size);
      return 0;  // Non-zero return values are reserved for future use.
    }
    bash
    # Clang searches for the static link libraries of libclang_rt.xxx.a, that is, sanitizers.
    # In addition to ASAN, other sanitizers such as UBSAN and TSAN can be added.
    clang -fsantize=address,fuzzer -g fuzz_me.c -o fuzz_me

If the build command fails to be executed, the installation in the previous step fails. In this case, check the location of the compile-rt library or reinstall the library.

  1. Run the binary file and check the output.
    bash
    ./fuzz_me 
    grep ERROR ./*.log | sort -k 3

If a sanitizer detects an abnormal behavior of the program, fuzzing output (a crash) is generated. In addition to the crash, the fuzzer also records parameters during fuzzing, such as code coverage (in the unit of basic block) and seed mutation.

  1. Reproduce the input to locate the vulnerability.
    bash
    ./fuzz_me crash-xxx
    ./fuzz_me -seed=xxx # xxx is the SHA-1 hash value of the crash.
    gdb fuzz_me

The sanitizer provides the vulnerability type and the environment where the vulnerability is triggered. Check the crash details, then reproduce the vulnerability, or use the GDB for debugging.

Helpful Utilities

For large-scale software, some build options need to be enabled to improve fuzzing efficiency.

  • -jobs: number of jobs. Each job triggers a crash. Jobs run in the worker processes. One worker can manage multiple jobs. If -jobs is set to 1000, simple bugs of the program can be bypassed.
  • -workers: number of processes. A maximum of half of the CPU cores can be used.
  • -forks: Replace -jobs = N and -workers = N with -fork = N.
  • -dict: dictionary, which is necessary for fuzzing files in a specific format.
  • CORPUS: corpus, which is used to save the input for triggering new paths in fuzzing.
  • -max-len: maximum input length, which is defined based on the size of the corpus file.
  • -run: Reduce the crash producer. The unit is the number of mutations in each iteration.
  • -shrink: Reduce the corpus size to improve code coverage.

Project Practice

  • Course: The libFuzzer workshop includes common methods for using libFuzzer.
  • Open source project CVE practice: libFuzzer has discovered many vulnerabilities in open source software, including the Heartbleed vulnerability in OpenSSL. Find a piece of open source software with vulnerabilities in Google fuzzer-test-suite and reproduce the vulnerabilities using libFuzzer.
  • Comparison with other tools: see Related links in the libFuzzer tutorial.

libFuzzer Principles

Mutation Algorithms

Mutation is a key step in modern fuzzers because it generates new inputs that can cover more basic blocks. libFuzzer contains a series of built-in simple mutation algorithms, most of which are bit-level inversion. It also supports user-defined mutation algorithms for targeted fuzzing.

Existing Mutation Algorithms

By observing the Stderr Output of libFuzzer, you can find the mutation algorithms used by the current input in the MS field. See the following figure:

libFuzzer has 12 built-in mutation algorithms, which are member functions of the MutationDispatcher class. The class definition code is as follows:

cpp
// Code path: LLVM/compiler-rt/lib/fuzzer/FuzzerMutate.cpp

MutationDispatcher::MutationDispatcher(Random &Rand, const FuzzingOptions &Options) : Rand(Rand), Options(Options) {
    DefaultMutators.insert(
        DefaultMutators.begin(),
        {
            {&MutationDispatcher::Mutate_EraseBytes, "EraseBytes"},
            {&MutationDispatcher::Mutate_InsertByte, "InsertByte"},
            {&MutationDispatcher::Mutate_InsertRepeatedBytes, "InsertRepeatedBytes"},
            {&MutationDispatcher::Mutate_ChangeByte, "ChangeByte"},
            {&MutationDispatcher::Mutate_ChangeBit, "ChangeBit"},
            {&MutationDispatcher::Mutate_ShuffleBytes, "ShuffleBytes"},
            {&MutationDispatcher::Mutate_ChangeASCIIInteger, "ChangeASCIIInt"},
            {&MutationDispatcher::Mutate_ChangeBinaryInteger, "ChangeBinInt"},
            {&MutationDispatcher::Mutate_CopyPart, "CopyPart"},
            {&MutationDispatcher::Mutate_CrossOver, "CrossOver"},
            {&MutationDispatcher::Mutate_AddWordFromManualDictionary, "ManualDict"},
            {&MutationDispatcher::Mutate_AddWordFromPersistentAutoDictionary, "PersAutoDict"},
        });
        // Implementation of the preceding functions
    }

The names of most mutation algorithms reflect their implementation methods. For example, EraseBytes calls the memmove function to overwrite some bits, and InsertBytes calls the memmove function to add a bit. It should be noted that in these built-in mutation algorithms, the mutation point and the mutation value are generated using a random function of the Rand series.

Adding a Mutation Algorithm

libFuzzer and AFL are coverage-guided fuzzing tools. When fuzzing specific objects, they may be filtered out at the early stage of program running because the mutation algorithms do not contain semantic information. Compared with generation-based fuzzing tools, libFuzzer and AFL are inefficient. Therefore, Google proposed structure-aware fuzzing, a libFuzzer plugin that allows users to add mutation algorithms. This section describes how to add a plugin and lists some officially implemented plugins.

The following code shows how to add a plugin. Implement a user-defined LLVMFuzzerCustomMutator function, add a specific mutation algorithm, and call LLVMFuzzerMutate in the function to implement common mutation.

When implementing code, use the conditional compilation instructions ifdef CUSTOM_MUTATOR and clang -DCUSTOM_MUTATOR to enable or disable the user-defined plugin.

cpp
// Optional user-provided custom mutator. 
// Mutates raw data in [Data, Data+Size) inplace. 
// Returns the new size, which is not greater than MaxSize. 
// Given the same Seed produces the same mutation. 
size_t LLVMFuzzerCustomMutator(uint8_t *Data, size_t Size, size_t MaxSize, unsigned int Seed); 

// libFuzzer-provided function to be used inside LLVMFuzzerCustomMutator.
// Mutates raw data in [Data, Data+Size) inplace.
// Returns the new size, which is not greater than MaxSize.
size_t LLVMFuzzerMutate(uint8_t *Data, size_t Size, size_t MaxSize);

To add your own plugins, refer to the related links or the topic "Structure-aware fuzzing for Clang and LLVM with libprotobuf-mutator" at the 2017 LLVM Developers' Meeting.

Coverage Statistics

Analyze code coverage from the following two perspectives:

  • Statistical precision: From rough to precise, code coverage are classified into three levels:
    • Function level: Collects statistics on called functions. Statistics on the internal code of the functions are ignored.
    • Basic block level: Collects statistics on basic blocks, which can be queried in the cov field of Stderr Output in libFuzzer.
    • Edge level: Collects statistics on not only basic blocks, but also virtual blocks created between basic blocks and the execution information.

  • Analysis object: The basic statistics method is instrumentation by adding count variables. There are three levels:
    • Source code: Provides coverage statistics modes in build options.
    • Intermediate representation: Specifies statistics collection mode, for example, llvm pass.
    • Binary: Use binary instrumentation tools such as Pin and DynamoRIO to collect statistics on hooks.

In short, libFuzzer uses SanitizerCoverage in the LLVM framework to collect source code–level coverage statistics. You can run the following command to specify the level. By default, the edge level is used.

bash
# xxx=edge,bb,func,trace-pc-guard,inline-8bit-counters,inline-bool-flag,pc-table,trace-pc
clang -fsanitize-coverage=xxx fuzz_me.c

In addition, you can develop analysis tools using SanitizerCoverage. The sanitizer provides a coverage callback interface that allows you to dump the coverage statistics result to a .sancov file when the fuzzing process is stopped. The LLVM framework provides the Sancov Tool to generate source code–level coverage reports.

Error Check

Common Errors

As mentioned, the mission of each job in the libFuzzer process is to complete the check task. It does not stop until a crash occurs or it times out. In this case, the libFuzzer daemon process captures the error code. If the error code is 77, a timeout occurs (the default timeout interval is 1,200 seconds) or the libFuzzer program is abnormal. If the error code is crash, the result and the input that causes the crash are recorded.

Sanitizers

Crash detection does not cover all fuzzing error check scenarios. For example, memory leakage and data races may not cause crashes, but they are serious errors. In such cases, memory detection tools such as Valgrind can be used. libFuzzer uses a series of sanitizers in the LLVM framework. These tools are provided by Google and can be used to detect runtime exceptions of C/C++. Common sanitizers are as follows:

  • AddressSanitizer (ASAN): Captures stack overflow and UAF vulnerabilities.
  • ThreadSanitizer (TSAN): Captures data races and supports C/C++ and Go.
  • UndefinedBehaviorSanitizer (UBSAN): Captures abnormal behaviors such as integer overflow and null pointer dereference.

ASAN is used as an example to analyze the principles of these error check tools. For details, see USENIX ATC 2021.

  1. During compilation, ASAN performs instrumentation before and after LLVM IR-level memory access operations (load, store, and alloca). Due to the 8-byte alignment requirement of memory, some memory is in unused status. Set it to shadow memory in memory mapping mode to show the read and write status.
  2. Hook the malloc function during running and set the Redzone area before and after the function, which is similar to the stack canary method. Set the shadow memory in the Redzone to unwritable to avoid overflow.
  3. Perform hook operations on the free function during running. Instead of releasing memory immediately, set the shadow memory to a negative value. That is, the shadow memory cannot be read or written, and is placed in the isolation area for observation. If UAF or wild pointer dereference occurs, it will be captured.

Other Fuzzers

  • Academia: As a hot academic topic in recent years, fuzzing-related papers were presented at top international conferences for fields such as system security, network security, software analysis, and programming language. For details, see the FuzzingPaper project. Most of these papers propose a fuzzer for a specific object (software, hardware, OS kernel, programming language, etc.) or a specific vulnerability type (race condition, buffer overflow, etc.), and use methods such as concolic fuzzing and deep learning to improve fuzzing efficiency.

  • Industry: Many tools are developed on GitHub based on AFL and libFuzzer, and subsequently many companies developing fuzzing technologies have emerged. Google developed a series of fuzzing tools and the FuzzBench platform to evaluate fuzzer performance in a unified manner.

In short, the research on fuzzing in academia and industry is closely related. Most fuzzing technologies are implemented based on the LLVM framework and are highly scalable. The academic research is conducted based on the existing tools in the industry, and the achievements with good performance are released on GitHub. The following lists some fuzzers. For more fuzzers, visit Awesome-Fuzzing.

  • Generic fuzzers

    • libFuzzer-gv: enhanced edition of libFuzzer

    • AFL++: enhanced edition of AFL

    • OSS-Fuzz + ClusterFuzz: large-scale distributed fuzzer implemented through cooperation between the frontend and backend

    • boofuzz: enhanced edition of the Sulley framework

    • phuzzer: Python framework for interacting with AFL

  • Safety

    • Honggfuzz: fuzzer targeting at software security vulnerabilities
  • Network protocol

    • AFLNet: gray-box fuzzer for network protocols
  • Kernel

  • Programming language

  • IoT

    • FirmAFL: gray-box fuzzer for IoT firmware
    • DIANE: fuzzer for IoT applications on mobile phones
    • Frankenstein: fuzzer for wireless IoT devices

If there are any errors, please contact cascades-sjtu.


[Disclaimer] This article only represents the author's opinions, and is irrelevant to this website. This website is neutral in terms of the statements and opinions in this article, and does not provide any express or implied warranty of accuracy, reliability, or completeness of the contents contained therein. This article is for readers' reference only, and all legal responsibilities arising therefrom are borne by the reader himself.