My Reading: Kernel Fuzzer

Linux kernel is a complicated code base written in c programming language. As for this, they are vulnerable to memory corruption. Generally, it is believed to be the kernel is trusted. The only way kernel can be acted according to the user is through system calls from user programme. So, the researchers are focusing on innovating new fuzzing techniques that can generate random syscall (with correct arguments) sequence. A number of vulnerabilities such as race-condition, data-races, user-after-free etc. are possible to be popped up with such technique. 

RAZZER: Finding Kernel Race Bugs through Fuzzing

Link: https://lifeasageek.github.io/papers/jeong:razzer.pdf
Source Code: https://github.com/compsec-snu/razzer (expected to be released)

Summary: Existing kernel fuzzer can be divided into two categories: 1) generate a different combination of syscall with a different combination of multi-threaded environment. 2) Impose different time interleave in the kernel thread. A combined system with both features was expected to appear and razzer is that. However, they bring one more layer on top of that system which is a static analysis tool. They get 30 new races in kernel and 16 of them are confirmed.

Summary: Existing kernel fuzzer can be divided into two categories: 1) generate a different combination of syscall with a different combination of multi-threaded environment. 2) Impose different time interleave in the kernel thread. A combined system with both features was expected to appear and razzer is that. However, they bring one more layer on top of that system which is a static analysis tool. They get 30 new races in kernel and 16 of them are confirmed.

Design: The system consists of a static analyzer, a single thread fuzzer, and a multi-thread fuzzer. The static analyzer uses a point-to analysis to select a pair of memory access instructions as a candidate race pair (overapproximate). Next, the single thread fuzzer consists of a generator and executor where the generator generates a sequence of random syscalls and executor observes if a candidate race pair is appeared to be executed. If there is one, the sequence of random syscalls later push to multi-thread fuzzer. It also has two components: the generator splits the syscalls into different threads and the executor observes if the selected candidate race pair instructions access the same memory. If they are, they tag that candidate race pair as true race pair.

Advantage: Time interleaving in a thread is an important issue for detecting data races. Random syscall sequence check is equally important to cover most of the kernel code and hence increase various race conditions detection. A combination of race condition and data races detection technique helps to find out more data races. The system design is simple and hence practical.

Disadvantage: Instead of using random syscall sequence from syzkaller, they can use MoonShine for generating a logical sequence of syscalls. It may reduce the overhead and increase the vulnerability findings. The multi-thread fuzzer generator put different syscalls in different threads randomly. A deterministic measurement here could be useful if it is possible to detect which syscall cause the execution of an instruction in the candidate race pair.

Discussion: When they try to match two instructions accessing same memory concurrently while at least one of them is writing on that memory, they completely discard one major: exploitability. Their 16 out 30 true race pair being confirmed indicates that not every true race pair is exploitable. I believe a more accurate implementation of this design can find more true race pair and even can see more 50% of them as non-exploitable.

MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation

Link: https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-pailoor.pdf
Source Code: N/A


Summary: Syzkaller is one of the most popular kernel fuzzer. It generates a sequence of random system calls. Due to the randomness, most of them are unrealistic cases. They lose the efficiency because they don’t consider dependency (both implicit and explicit) among system calls. There are some other kernel fuzzer who uses a large number of benchmark program and use their sequence of system call to their fuzzer. In this project, they use a filtering algorithm that can identify dependent syscalls from real-world applications and use these syscall blocks into the fuzzer to the randomness of the blocks.

Design: The project basically extends the syzkaller and replace their random syscall sequence generator with random syscall block sequence generator. In the first step, they pour down the real-world application and observe their explicit and implicit syscall dependency. One example could be a file read; a file first must open, then read and usually they also close the file. If we think deeper, before a file read, the application also usually allocate some memory as the buffer to store the read data. In kernel perspective, these dependencies could be implicit or explicit. An explicit dependency is if a return of a syscall is used as an argument to another syscall. An implicit dependency is if two different syscalls use the same data structure in the kernel. When an explicit syscall is straightforward to resolve, the implicit dependency requires a kernel level static analysis where it collects if two different syscalls may use the same global variable in the kernel where one is a write and another is a read.

Advantage: The motivation is realistic, the design is simple, hence the project is significant and practical. They have found 17 new vulnerabilities of which 7 of them can be also found through explicit dependency check.

Disadvantage: The system don’t try to prove exploitability like every other. But, as the goal of the system is to generate more logical fuzzing input than complete randomness, their probability of exploitability is higher. They have confirmed 9 out of 17 have already fixed.

Discussion: The motivation is genuine. Although the workload is not heavy, it is the impact which let it be a significant work.