DeHydra and Source Analysis at Mozilla

Taras Glek

Mozilla Corporation

Outline

  • DeHydra:
    • Current Mozilla toolset
    • Other tools
    • Interesting queries
    • DeHydra solution
    • JavaScript vs C++
    • Overview of a DeHydra analysis
  • Open Problems:
    • Reference Counting
    • Automatic Refactoring
    • JavaScript Analyses
    • My Work: Source2Source & Analyses
  • Conclusion

Current Mozilla Toolset

  • 2545 C/C++ files are compiled in a typical Mozilla build
  • 6201 C/C++ source files present in CVS
  • Researcher's dream:
    • Very few questions about the code can be answered without automated tools
    • Fast paced development means that tools will be immediately useful
  • Grep & the C++ compiler are the most frequently used analysis tools
  • Most tools do not parse C++, do not scale to Mozilla or do not allow user customization

Other tools

Useful Queries

  • Which classes hold a pointer of a specific type to Foo?
  • Which functions/modules are dead code?
  • Which classes have uninitialized members?
  • Where in the code is reference counting optional?
  • Are structs packed correctly?

DeHydra

  • Modelled after UNO: Restrict the domain of analyses to simplify writing them
  • JavaScript API has a simplified view of the C/C++ AST:
    • Control Flow graph for intra-functional queries
    • Inheritance graph for some global queries
  • A higher-order function written in C++ to check graphs using functions written in JS
  • Implemented using the Elsa C++ parser

JavaScript vs C++

  • Pros:
    • Garbage Collection
    • Data structures are easier to construct
    • More concise. Supports functional programming style.
    • No compilation step
    • Easy data exchange with other apps with builtin serialization
    • Introspection
  • Cons:
    • Limited domain of analyses
    • Lower performance

Overview of DeHydra Analysis

  • CFG edges are labelled with conditions
  • CFG is supplemented with basic abstract interpretation. It sets variables 0/1/Other to check for unfeasible conditions
  • Variables are either defined, assigned or evaluated
  • CFG checks are designed around state transitions
  • Every edge is only visited once per state
  • Example
  • Class hierarchy queries are achieved by passing class inheritance info & members to a callback.

Problem: Rapidly Modernize Mozilla

Problem: Ref Counting

Problem: Refactoring

  • Check and enforce exception safety. (Dangling pointer sense)
  • Convert from nsresult to exceptions:
    • Current errors are propagated through return values
    • Return values are passed through outparams
  • Convert C++ to more modern idiomatic C++
  • Convert C++ to JavaScript

Problem: JavaScript Analysis

  • Extend DeHydra to support JavaScript input
  • Develop cross-language unified analyses for C++/JavaScript.

My Work - Source2Source Transformations

  • Rewriting C/C++ with aesthetically pleasing output requires support for the C preprocessor in the C/C++ parser
  • Ensure that various bool types(int typedefs) are always 1/0 or rewrite offending expressions automatically.
  • Rewrite basic nsresult functions:
    • There are functions that don't propagate errors and only return a "failed" error or a value through a parameter
    • Can be rewritten to return NULL on error or a value on success
    • DeHydra can detect these functions
    • Another tool rewrites definition and callsites based on input provides by DeHydra

My Work - Analysis

  • Build a complete callgraph for Mozilla:
    • Find dead code, functions or even whole modules
    • Useful for thread, exception safety, etc
    • Hard Problem: Parts of the C++ callgraph are in JavaScript. Would need to analyze JS for more precision.

Conclusion