Assignment 1 (individual or in pairs): Program Analysis with PREfast and SAL

In this project we use PREfast, a static analysis tool for C(++) by , and the associated annotation language SAL on some toy C code.

PREfast is an older tool by now, but it still gives an nice illustration of

  1. what static analysis can do (using some automated theorem proving technology behind the scenes);
  2. the hassle, in the form of annotations, required for a language like C(++) to make it amenable to static analysis.

Learning objectives

Goals of the project are

Thanks to Jonathan Aldrich and his colleagues at CMU for info on PREfast and sample exercises.

Handing in the assignment

The project is due Thursday Sept 29 so that we can discuss solutions in class the day after.

You can do the project individually or as a pair. If you do it as a pair, say that you did and mention both your names as comments the code.

You only have to hand in the annotated C++ code, i.e. part I. Hand this in as a file called "YourName_prefast_exercise.cpp" with your modified and annotated C++ code, where YourName is your full name, without spaces, concatenated. This is just to ensure that all files have a different name. If you do the assignment together with someone else, then call the file "Name1_Name2_prefast_exercise.cpp" so both your names are in the the file-name.

Installing PREfast

We provide a version of PREfast that can be used stand-alone the command line. How to install this described in detail here. (PREfast is also included in some versions of Visual Studio, buried deep inside the GUI, but figuring out if it this there at all and getting it to work takes way more time, and won't give uniform results for everyone.)

Part I - Using the tool

If you've followed the installation instructions for PREfast above, then you should already have a copy of the exercise file prefast_exercise.cpp.

Running PREfast, by compiling with the option /analyze, should produce 7 warnings: C4996, C6386, C6011, C6217, C6282, C6273, C6031. If you don't get the C4996 warning, the command line option /W 3 is probably missing; you have to include that, as in cl /analyze /W 3 prefast_exercise.cpp

Once that, works follow the steps below:

  1. Get rid of the warnings in prefast_exercise.cpp that PREfast gives, by fixing the code. Mark places where you changed the code with a comment //FIXED to keep track of the changes you made.

    Keep the changes to the code minimal; the code is completely silly, no need to completely rewrite it.

  2. Annotate all buffers that are passed as parameters (i.e. all parameters of type char* or int*), to specify whether they are read from and/or written to, and specify their lengths. This means you have annotate them If the length is a compile-time constant, such as 55 or BUF_SIZE, rather than a program variable, you need another suffix c_. So, for example, you can annotate a buffer with _Out_cap_(len) or with _Out_cap_c_(BUF_SIZE). (Why this extra c_ is needed for constants is a mystery to me.)

    There is no need to annotate the size of the argument of execute, as its size does not really matter. You also do not need to annotate validate.

    Fix any new warnings this produces.

  3. Similarly, annotate the buffers returned as results by my_alloc and do_read to specify their size, using the annotations _Ret_cap_(...) or _Ret_cap_c_(...).

    Fix any new warnings this produces.

  4. As last step, we will add tainting annotations to trace any input passing from input to execute without passing through the validation operation, and add calls to the validation routine validate in the right places to fix any problems with missing input validation. The steps for this are explained in more detail below.

    To do this, first

    So you get

       HRESULT input([SA_Post(Tainted=SA_Yes)] _Out_... char *buf) {...
    
       int execute([SA_Pre(Tainted=SA_No)] _In_ char *buf) {...
    

    Now annotate all the procedures that may handle or produce tainted data using pre- and/or postconditions as above. These procedures are:

    To specify that the return value of a function is tainted, declare it as

       [returnvalue:SA_Post(Tainted=SA_Yes)] char* somefunction() { ...
    
    PREfast should now produce warnings C6029, when it spots that the program is passing tainted data to the function execute().
    Add calls to the validation routine validate in the right places to make such warnings disappear.

    As you may notice, PREfast's tainting analysis is not reliable unless you annotate all procedures that may handle tainted data correctly.

  5. As a final check, to make sure that you have not forgotten to annote some things: in the end

Part II - Reflection

Think about answers to the following questions. We will discuss these in class on Sept 30. So there is nothing to hand in for this.
  1. PREfast tries to check annotations -- and hence the properties they express -- at compile time. An alternative approach would be to check this at runtime. Two different aspects for which this could be done are 1) bounds-checking and 2) tainting & missing input validation. This would require some additional information to be tracked at runtime: for bounds-checking this could involve something like fat pointers to check access out of bounds at runtime; for tainting data would have to be marked and traced as being tainted.

    Name two advantages and two disadvantages of doing these checks at runtime instead of doing them at compile-time. (I can think of two each. Hint: also think of generic advantages and disadvantages when it comes to runtime vs compile-time checking. Maybe you can think of more?)

  2. Sometimes PREfast only warns about problems after you add annotations. For example, the tool does not complain about zero() until after you add an annotation about the size of buf. An alternative tool design would be to produce a warning about zero() if there are no annotations for it. (The warning would then not so much be that there is a potential buffer overflow problem, but rather that the tool does not have enough information to determine whether there is a buffer overflow or not.) Can you give a plausible explanation why PREfast has been designed so that it does not complain about such unannotated methods?

More information for this project

For more info about: