Fuzzinator reloaded

It's been a while since I last (and actually first) posted about Fuzzinator. Now I think that I have enough new experiences worth sharing.

More than a year ago, when I started fuzzing, I was mostly focusing on mutation-based fuzzer technologies since they were easy to build and pretty effective. Having a nice error-prone test suite (e.g. LayoutTests) was the warrant for fresh new bugs. At least for a while. As expected, the test generator based on the knowledge extracted from a finite set of test cases reached the edge of its possibilities after some time and didn't generate essentially new test cases anymore. At this point, a fuzzer girl can reload her gun with new input test sets and will probably find new bugs. This works a few times but she will soon find herself in a loop testing the common code paths and running into the same bugs again and again. Here, she can put her trust in other developers to make mistakes in the covered code ... or she can describe the language-under-fuzzing more precisely. For this, grammars offer a great opportunity.

The first blogpost I've read about HTML5 grammars claimed that HTML5 had a very simple grammar since it could basically be matched with /.*/. :) Although I thought it would be just a joke, this was the base of my first grammar-based fuzzer written in 10 minutes and it actually found one bug in WebKit!

If we want to be more precise than /.*/, then we can use context-free grammars to describe the structure of our favorite language. Fortunately, many resources (parser grammars, XSD definitions, manuals, etc.) are available online that can help us in building generator grammars for "foo" or "bar" languages, but we have to collate the pieces into one representation by ourselves to allow the generation of the actual fuzzer. With this in mind, the grammar rules of a specific HTML element might look like this:

: TAG_OPEN 'div' divAttribute* TAG_CLOSE divContent TAG_OPEN SLASH 'div' TAG_CLOSE
| TAG_OPEN 'div' divAttribute* TAG_SLASH_CLOSE
| TAG_OPEN 'div' divAttribute* TAG_CLOSE

: accesskey
| align
| class
| ...

: divElement
| emElement
| pElement
| text
| ...

It's only a matter of time (but actually quite a lot) to create such a model for any language. And knowing that, this makes non-fuzzer-girls quite often argue: "What's this fuzz about gramar-based fuzzers? Anyone can write a grammar! Afterwards, generating from that is just piece of cake!" Sadly, there is a thing that cannot be built into a CF grammar! It's semantics! Just think about matching identifiers. Such features are already needed when you generate only one language (e.g., in SVG referring to filters by ID), and things become more complicated when creating a complex output, e.g., HTML with a suitable CSS definition and with a JavaScript referring correctly to the contents of both the HTML and the CSS sources. Fortunately, Fuzzinator uses its own flavour of grammar description, which allows semantic extensions and makes lots of things possible, e.g., building symbol tables for generated functions and variables, allowing operations on these, creating valid CSS selectors etc.

Such a fuzzer is quite an effective bug hunter gun but it will still fire randomly since it's lacking a gun-sight. To be able to focus on specific features we could/should assign priorities to certain rules. I've implemented such a prioritization method and tested it on the grid layout module of CSS: it found 14 bugs in Blink and 6 more in WebKit in a short period of time. And I was quite happy afterwards. :)

Once we are done with all of the previous steps we can lean back and wait for the rain of bugs. The size of the received bug-triggering inputs depends on our luck or on our settings, but we can be prepared for a few hundred kilobytes of generated data. Since reporting these raw failing test cases would not be developer friendly, I'm using an adapted version of A. Zeller's test case minimization method on the received inputs. With this, the tests are automatically reduced to ~1-5kB and after a quick manual post-processing, they can be reported to issue trackers.

To summarize all the above in numbers: until today, Fuzzinator has 195 reported bugs in WebKit and 167 reported bugs in Blink (unfortunately, I could not assign a label to the bugs in the Chromium issue tracker, so they can only be found by searching for my e-mail addresses). And stay tuned for some 50+ additional (yet) unreported issues.

For the future, there are lots of plans and possibilities how to improve the current generators and what other languages can be supported (I'm working on WebGL and XML fuzzers right now). So, extensions of Fuzzinator and more blog posts about him are (again) only a matter of time (and interest).

Victor Costan (not verified) - 10/23/2014 - 19:46

Thank you very much for your work on improving the quality of Blink and WebKit!

Rego (not verified) - 10/28/2014 - 15:13

Thanks for the effort testing CSS Grid Layout code, you've helped us to improve the stability of the feature.
Carry on with the great work!

Anonymous (not verified) - 01/27/2016 - 16:36

Do you plan on releasing this project?

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • No HTML tags allowed
  • Lines and paragraphs break automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Fill in the blank