I have spent the last week or so fixing bugs in Winter (our programming language) found by our new fuzz testing code.

Fuzz testing is the process of sending lots of random input to a program, to try and uncover crashes or other undesirable behaviour from the program.

If you are writing a compiler, I highly recommend fuzz testing it, it seems like a very effective way of finding bugs.

Some more details about our fuzzing code:

It's an in-process fuzzer - random programs are generated, compiled, and run if compilation succeeded (e.g. if the program is valid), all in the same process. Because it's in-process, as soon as a crash occurs, the fuzzing run is over. Each random program string is saved to disk before it is run, so to find the program that crashed the process, you just have to get the last line from the file on disk.

It's multithreaded, because you need all the power you can get to find any bugs as fast as possible.

The fuzzer runs through about 20,000 random programs per second per core, so running on 4 cores gives around 80,000 random programs per second. This relatively fast speed is possible because invalid programs are rejected early before computationally intensive stuff like LLVM optimisation and codegen starts.

Valid programs only run at about 300 programs per second per core, but most random programs are invalid, so the average speed comes out to 20K/core/s

The random program generation strategy is something like:

Randomly pick a program from the unit test suite programs (several hundred covering all language features)

Then repeatedly mutate program randomly a few times by picking from among a few options:

1) Insert a random character at a random position in the program.
2) Delete a random chunk of characters from the program
3) Copy a chunk of the program to another random position in the program.
4) Read a section of a random other program from the test suite and insert into the program.
5) Insert a random language keyword or syntactical elements (square bracket pair etc..)

As mentioned I found this fuzzing code to be extremely effective in finding bugs. I found 40-50 crash bugs that took something like a week to fix, and triggered improvements and refactorings to avoid crashes. Our programming language Winter is a lot more robust now.

if you are wondering how there can be so many bugs in a language compiler/runtime, bear in mind that Winter is quite complex, with generic functions, operator overloading, function overloading, type coercion, JITing, higher order functions etc..

I also tried out American Fuzzy Lop (AFL). I ran it for 24 hours or so, in which time it got through 20% or so of the first pass. It didn't find any crashes. I'm not sure if this is because it's not very good at finding crashes, or if I removed all the crash bugs :) It's definitely a lot slower than our in-process multithreaded fuzzing though.

Note that in theory AFL uses a more powerful kind of fuzzing - coverage-guided fuzzing, which can be considered a white-box approach, since it uses knowledge of what code paths are taken for each input program.

Therefore I would highly recommend integrating an in-process fuzzer into your test suite, and after that maybe check out AFL or similar.

(If you want to know more about Winter there are some presentation slides in PDF format from a talk I gave a while ago here: Indigo Shader Language and Winter)