Software Security - PREfast project

Overall learning objectives of this PREfast exercise

Seeing what these SAST tools can do and getting a feel for how useful they are.
Experiencing how ugly some programming languages, in this case C/++, are.
Seeing how many annotations may be needed on the interfaces of functions to analyse code C/C++ code in a modular way.
Note that these annotations are only making information EXPLICIT that programmers should have in their head when coding. The fact that all this information is all IMPLICIT is of course why software can be difficult to maintain, difficult to understand, or difficult to write in the first place.
Appreciating how design choices, coding conventions, and coding style can affect things: it can make code easier to understand and analyse, not only for tools like PREfast, but also for human programmers.

Some generic feedback on the PREfast exercise

Some people allowed *my_alloc to return a null pointer, namely in the case that the malloc() failed, and then had an _opt_ annotation on the return value, i.e. _Ret_opt_cap(size) char* my_alloc(...)
In principle this is ok, but note that you then need _opt_ annotations and/or null-checks in other places in the code. For instance, copy_data may then be called with a null parameter, or the call to memcpy in copy_data can crash or corrupt memeory when it is passed this null parameter.
In general, it is simpler to exit() if a malloc() fails, so that a function like *my_alloc nevers returns null, which makes life simpler in the rest of the code. If you allow *my_alloc to return null and proceed with execution of the program, you may well have to exit() - or perform similar drastic action - at some later stage when you actually want to use this buffer that is now null, so you might well have exit-ed when the malloc failed.
Note that in *my_alloc(), if size==0 then initialising ch[size-1] to null is an buffer overflow (or rather, a buffer underflow), even if the malloc succeeded. (If the malloc failed, it is clearly also a memory error.)
PREfast does not warn about this issue; it seems to ignore interger over/underflow altogether.
There are many ways to avoid this issue. You could exit my_alloc() if size==0, as you are clearly not meant to call my_alloc() with a zero argument. Alternatively, you could malloc(size+1) bytes, interpreting the size parameter as the size excluding the null terminator, thus making sure there is always room for the null-terminator; but of course size+1 can now produce an integer overflow, so you'd have to guard against that.
Annotating the return value of copy_data as being tainted, eg with
```
   [returnvalue:SA_Post(Tainted=SA_Yes)] void copy_data(...) 
```
does not make sense: this function is void so it does not return any value. Any tainting annotations for copy_data will have to be on the parameters of copy_data.
There are many solutions here. Eg you could specify that
1. the "output" buf2 will be untainted in the post-state if the "input" buf1 is untainted in the pre-state, or that
2. the "output" buf2 will be tainted in the post-state if the "input" buf1 is tainted in the pre-state .
The expressivity of SAL is limited here: You cannot cannot specify both property 1) AND 2) at the same time, even though both properties are true.
Depending on your input validation strategy (ie. either validating data as soon as possible and only propagating untainted information, or propagating tainted data and only validating when you feed tainted data to dangerous functions either annotation 1) or 2) will suffice to let PREfast check no tainted data is sent to execute().

Part II - Reflection

Disadvantages of dynamic analysis include:

It requires running code.
Additional overhead (in memory usage & execution time).
Coverage is only as good as the test suite, meaning you can expect false negatives.
Errors are caught later, at runtime, rather than sooner (at compile time. or even before compile-time: if you only spot a problem at runtime, it's too late to fix the code; if you find it at compile time, you can still fix the code.

Advantages of dynamic analysis include

Warnings are always true positives, i.e. no (or at least fewer) false positives) (as analysis is simpler if you know the actual values of any variables).
No need for annotations, though annotations can still be helpful to provide more feedback or more precise feedback.
Better precision, as you have more info at runtime

Pros & cons of static analysis are exactly reversed. Advantages of static analysis include:

Possible at or even before compile time.
No overhead at runtime
Better coverage: in theory perfect, but there are trade-offs her with false positives and need for annotations.
This also means: fewer false negatives, at least in principle.

Disadvantages of static analysis include:

More false positives. (Not that even if a static analysis tool gives a true positives, developers may dispute this; it is harder to dispute if a dynamic analysis reveals an error.)
Need for annotations. But: an added bonus of annotations is more precise feedback about the root cause of a problem. Also, these annotations can be useful for developers too.

Not requiring running code means that static analysis can catch problems earlier, at or before compile-time. Some people think that is that static analysis is not possible if it involves inputs whose values are unknown at compile time, but that is not true: compile-time analysis is possible even when dealing with unknown inputs, but it is harder, and it may ultimately be too complex, resulting in false positives or negatives.

Of course, in general static analysis will is harder as it involves reasoning over arbitrary values, whereas checking properties at runtime is easier because you know specific values. For example, at runtime you can simply check if a pointer is NULL or not, but a compile-time static analysis will have to consider both possibilities. Also, a static analysis may require more annotations: to express assumptions on input values, to allow modular analysis of interactions between functions, and then keep the amount of false positives and negatives in check.

Note that one could argue that the reason for PREfast not complaining about unannotated code is not so much preventing too many false positives, but more generally preventing too many warnings at all, be they false or true positives. Programmers will be turned off if a tool spews out too many complaints, even if many of these complaints are true positives...