Software Security - PREfast project
Assignment 1 (individual or in pairs): Program Analysis with PREfast and SAL
In this project we use PREfast, a static analysis
tool for C(++) developed at Microsoft, and the associated
annotation language SAL, on some toy C code.
PREfast is an older tool by now, but it still gives an nice
illustration of
- what static analysis tools can do (using
automated theorem proving technology behind the scenes);
-
the amount of hassle, in the terms of annotations, a language like
C(++) needs to make code amenable to such static analysis.
Learning objectives
Goals of the project are
- to appreciate - if it wasn't clear from the lectures - some of the many things that can go wrong in C(++) code;
- to understand the capabilities and the limitations of an (almost) state-of-the-art static analysis tool;
- to understand the trade-offs in the design and in the use of such a tool.
Thanks to Jonathan Aldrich and his colleagues at CMU for info on PREfast and
sample exercises.
Handing in the assignment
The project is due Thursday Sept 24 (so we could discuss
solutions in class the day after). If this deadline turns out
not to be feasible because you run into technical issues, we can
discuss these technical issues on Sept 25 and push the deadline a
week forward.
You can do the project individually or as a pair. Either way,
you have to join one of the "pair or individual assignments"
groups in Brightspace (for the 5 EC NWI-IMC051 course, even if
you take the 6 EC NWI-ISOFSE course) and hand is it as this
Brightspace group. If you are still not registered in
Osiris/Brightspace at the RU, let me know and you can submit your
answers by email instead.
Hand in your solutions via Brightspace as two separate files
-
a text or PDF file with your answers to the questions in part II;
-
a file called "YourNames_prefast_exercise.cpp" with your modified and annotated C++ code, i.e. part I below,
where YourNames is/are your full name(s), without
spaces,
concatenated. This is just to ensure that all files have a different name.
MAKE SURE THAT YOUR NAME(S) IS/ARE ARE ALSO _IN_ BOTH FILES,
SO THAT IF WE PRINT THEM WE CAN FIGURE OUT WHO THEY BELONG TO!
Installing PREfast
We provide a version of PREfast that can be used stand-alone the
command line. How to install this described in detail here. (PREfast is also
included in some versions of Visual Studio, buried deep
inside the GUI, but figuring out if it this there at all
and getting it to work takes way more time, and won't
give uniform results for everyone.)
The assignment - Part I - Using the tool
If you've followed the installation instructions for PREfast above,
then you should already have a copy of the exercise file
prefast_exercise.cpp.
Running PREfast, by compiling with the option /analyze, should
produce 7 warnings: C4996, C6386, C6011, C6217, C6282, C6273, C6031.
If you don't get the C4996 warning, the command line option
/W 3 is probably missing; you have to include that, as in
cl /analyze /W 3 prefast_exercise.cpp
Once that, works follow the steps below:
- Get rid of the warnings in prefast_exercise.cpp that PREfast gives, by fixing the code.
Mark places where you changed the code with a comment
//FIXED to keep track of the changes you made.
Keep the changes to the code minimal; the code is completely
silly, no need to completely rewrite it.
- Annotate all buffers that are passed as parameters (i.e. all parameters of type char* or int*), to specify whether they are read from and/or written to, and specify their lengths. This means you have annotate them
- with _In_count_(...) if they are only read from;
- with _Out_cap_(...) if they are only written from;
- with _Inout_count_(...) if they are both read from and written from (but I don't think you need that).
If the length is a compile-time constant, such as 55 or BUF_SIZE, rather than a program variable, you need another suffix c_.
So, for example, you can annotate a buffer with
_Out_cap_(len) or with
_Out_cap_c_(BUF_SIZE). (Why this extra c_ is
needed for constants is a mystery to me.)
There is no need to annotate the size of the argument of execute,
as its size does not really matter. You also do not need to
annotate validate.
Fix any new warnings this produces.
- Similarly, annotate the buffers returned as results by my_alloc and do_read to specify their size, using the annotations
_Ret_cap_(...) or _Ret_cap_c_(...).
Fix any new warnings this produces.
- As last step, we will add tainting annotations to trace any input passing from
input to execute without passing through the validation
operation, and
add calls to the validation routine validate in the right places
to fix any problems with missing input validation. The steps for
this are explained in more detail below.
To do this, first
- annotate the first parameter of input with [SA_Post(Tainted=SA_Yes)], which specifies that this parameter will be tainted as postcondition, and
- annotate the parameter of execute with
[SA_Pre(Tainted=SA_No)]
to specify the precondition that this parameter should not be tainted.
So you get
HRESULT input([SA_Post(Tainted=SA_Yes)] _Out_... char *buf) {...
int execute([SA_Pre(Tainted=SA_No)] _In_ char *buf) {...
Now annotate all the procedures that may
handle or produce tainted data using pre- and/or postconditions as above.
These procedures are:
- do_read, as it calls input, which
produces tainted data;
- copy_data, as it is used to copy data coming from
do_read, which is tainted.
To specify that the return value of a function is tainted,
declare it as
[returnvalue:SA_Post(Tainted=SA_Yes)] char* somefunction() { ...
PREfast should now produce warnings C6029, when it spots
that the program is passing tainted data to the function
execute().
Add calls to the validation routine validate
in the right places to make such warnings disappear.
As you may notice, PREfast's tainting analysis is not reliable
unless you annotate all procedures that may handle tainted data
correctly.
- As a final check, to make sure that you have not forgotten to
annote some things: in the end
- except for the functions execute and validate,
for all other functions all parameters and return values
that have a pointer type should have a size annotation
specifing buffer lengths;
- all parameters or return values of functions that might be
tainted at some stage should have tainting annotations.
The assignment - Part II - Reflection
Briefly (in a few sentences) answer the following questions
- PREfast tries to check annotations -- and hence the
properties they express -- at compile time.
An alternative approach would be to check this at
runtime.
Two different aspects for which this could be done
are 1) bounds-checking
and 2) tainting & missing input validation.
This would require some additional information to be tracked
at runtime:
for bounds-checking this could involve
something like
fat pointers to check access out of bounds at runtime;
for tainting data would have to be marked and traced as being
tainted.
Name two advantages and two disadvantages of doing these checks at runtime
instead of doing them at compile-time. (I can think of two each.
Hint: also think of generic advantages and disadvantages when it comes to runtime vs compile-time checking. Maybe you can think of more?)
- Sometimes PREfast only warns about problems after you add
annotations. For example, the tool does not complain about zero()
until after you add an annotation about the size of buf.
An alternative tool design would be
to produce a warning about zero() if there are no annotations
for it. (The warning would then not
so much be that there is a potential buffer overflow problem, but
rather that the tool does not have enough information to
determine whether there is a buffer overflow or not.)
Can you give a plausible explanation why PREfast has been designed so
that it does not complain about such unannotated methods?
Keep your answers concise and to the point.
More information for this project
For more info about:
- C(++): to look up what system calls such as gets, gets_s, memcpy, printf, system, ... do, see the online C reference
or C++ reference
specs.
- Secure coding in C(++): CERT publishes guidelines
for this:
CERT C coding standard
and
CERT C++ coding standard.
You should not have to look at these for this assignment. But
note that if you're ever going to write C(++) code, you will have
to.
- SAL: There is a lot of information on SAL on Microsoft's website.
Peter Gutmann's Experiences with SAL/PREfast has some nice discussion and suggestions on how to use PREfast and SAL in practice.
Beware that the syntax of SAL keeps changing every few years,
so stick to the syntax listed above in the exercises, which is
this SAL version,
as this syntax works with the version of PREfast that we provide.
The new SAL
version 2.0 supported by Visual Studio 2019 uses
more intuitive names: e.g. _In_count_ and
_Out_cap_ are now called _In_reads_ and
_Out_writes_.
Microsoft has also changed the meaning of the acronym at some
stage: it
no longer stands for Standard Annotation Language but
now stands for Source Code Annotation Language.