Software Security - PREfast project
Overall learning objectives of this PREfast exercise
- Seeing what these SAST tools can do and getting a feel for how
useful they are.
- Experiencing how ugly some programming languages, in this case C/++,
are. And realising how many implicit assumptions there can be
in a typical piece code, about which values may or should
not be null, about the lengths of buffers, or about buffers
being `tainted' user input that is dangerous to feed to some API
calls.
- Seeing how many annotations are needed on the interfaces
of functions to analyse code C/C++ code in a modular way.
Note that these annotations are only making information explicit
that programmers should have in their head when coding. The fact
that all this information is all implicit is of course why software
can be difficult to maintain, difficult to understand, or difficult
to write in the first place.
- Appreciating how design choices, coding conventions, and coding style
can affect things: it can make code easier to understand and analyse,
not only for tools like PREfast, but also for human programmers.
Some generic feedback on the PREfast exercise
- Some people allowed *my_alloc to return a null pointer in
the case that the malloc() failed and then had an _opt_ annotation
on the return value, i.e.
_Ret_opt_cap(size) char* my_alloc(...)
In principle this is ok, but note that you then need _opt_
annotations and/or null-checks in other places in the code.
For instance, copy_data may then be called with a null
parameter and the call to memcpy in copy_data can crash (or
corrupt memory) when it is passed this null parameter.
In general, it is simpler to exit() if a malloc() fails, so that a
function like *my_alloc nevers returns null, which makes life simpler
in the rest of the code. If you allow *my_alloc to return null and
proceed with execution of the program, you may well have to exit() -
or perform similar drastic action - at some later stage when you
actually want to use this null buffer, so you might well
have exit-ed when the malloc failed.
- Note that in *my_alloc(), if size==0 then initialising ch[size-1] to
null is an buffer overflow (or rather, a buffer underflow), even if the malloc
succeeded. (If the malloc failed, it is clearly also a memory error.)
This is a subtle issue that I missed in my explanation in class, I think.
There are many ways to avoid this issue. You could exit my_alloc()
if size==0, as you are clearly not meant to call my_alloc() with a
zero argument. Alternatively, you could malloc(size+1) bytes,
interpreting the size parameter as the size excluding the null
terminator, thus making sure there is always room for the
null-terminator; but of course size+1 can now produce an integer
overflow, so you'd have to guard against that.
PREfast does not warn about possible integer overflows. I hope
it is obvious why it chooses not to do this.
- Some people had a _Ret annotation for the
return value of the function copy_data(). But the
function is of type void (i.e. it
is a procedure rather than a function that produces a return
value), so there is no return value, and so a _Ret
annotation makes no sense.
- Annotating the return value of copy_data as being tainted, e.g. with
[returnvalue:SA_Post(Tainted=SA_Yes)] void copy_data(...)
does not make sense, for the same reason.
There are many solutions to speciying taintedness information
for copy_data(). Eg you could specify that
- the 'output' buf2 will be untainted in the post-state if the
'input' buf1 is untainted in the pre-state, or that
- the 'output' buf2 will be tainted in the post-state if the 'input'
buf1 is tainted in the pre-state .
The expressivity of SAL is limited here: You cannot cannot specify
both property 1) AND 2) at the same time, even though both
properties are true.
There are two sensible input validation strategies for this
piece of code (and in general, though it larger programs
there may be other options): either a)
you validate/sanitise data as soon as possible and only propagate untainted information, or
b) you propagate tainted data and only validate/sanitise data when you feed (possibly) tainted data to dangerous functions.
Depending on which strategy to use, annotation 1) or annotion 2)
will suffice to let PREfast check no tainted data is ever fed to
execute().
Of course, there is also the safer - aka more paranoid -
strategy of validating/sanitising data both upon entry
and upon exit.
The notions of 'sanitisation' and 'validation', and the differences
between them, will be discussed later (and are discussed in
detail in the lectures nots on input handling).
Reflection
Disadvantages of dynamic analysis include:
- It requires running code - i.e. code that can be executed,
not some source code under construction.
- Additional overhead, in memory usage & execution time.
- Coverage is only as good as the test suite, meaning that we can expect
false negatives.
- Errors are caught later, at runtime, rather than sooner, at
or even before compile time.
Advantages of dynamic analysis include
- Warnings are always true positives, i.e. there are no false positives. Though if you use annotations to catch bugs, then you may get false positives because of mistakes in these annotations.
- No need for annotations, though annotations can still be helpful to provide more
feedback or more precise feedback.
For example, immediately producing a warning
when a NULL parameter is fed to some function instead of
letting the program crash with a null-pointer deference
later can save you a lot of time debugging: the actual null
dereference, which will result in a segmentation fault on
most (but not all!) systems, may not happen until 10 minutes later
so there is hard debugging work to do to trace the problem back
to its origin, which may be thousands lines of code away from
where the crash occurred.
- Better precision, as you have more info at runtime: you know the actual values of
variables.
Pros & cons of static analysis are exactly reversed.
Advantages of static analysis include:
- Possible at or even before compile time.
- No overhead at runtime.
- Better coverage: in theory perfect, but there are trade-offs here with false positives and need for annotations.
This also means: fewer false negatives, at least in principle.
Disadvantages of static analysis include:
- More false positives. (Not that even if a static analysis tool gives
a true positives, developers may dispute this; it is harder
to dispute if a dynamic analysis reveals an error.)
- Need for annotations.
But: an added bonus of annotations is more precise feedback about the root cause of a problem.
Also, these annotations can be useful for developers too.
Not requiring running code means that static analysis can catch
problems earlier, at or before compile-time. Some people think
that is that static analysis is not possible if it involves
inputs whose values are unknown at compile time, but that is not
true: compile-time analysis is possible even when dealing with
unknown inputs, but it is harder, and it may ultimately be too
complex, resulting in false positives or negatives.
Of course, in general static analysis will is harder as it
involves reasoning over arbitrary values, whereas checking
properties at runtime is easier because you know specific values.
For example, at runtime you can simply check if a pointer is NULL
or not, but a compile-time static analysis will have to consider
both possibilities. Also, a static analysis may require
more annotations: to express assumptions on input values, to
allow modular analysis of interactions between functions, and
then keep the amount of false positives and negatives in check.
Note that one could argue that the reason for PREfast not
complaining about unannotated code is not so much preventing too
many false positives, but more generally preventing too many
warnings at all, be they false or true positives. Programmers will
be turned off if a tool spews out too many complaints, even if many
of these complaints are true positives...