******************************************************************
Overall learning objectives of this PREfast exercise:
******************************************************************

- seeing what these SAST tools can do and getting a feel for how
  useful they are

- experiencing how ugly some programming languages, in this case C/++,
  are

- seeing how much annotations may be needed to analyse code.

  Note that these annotations are only making information EXPLICIT
  that programmers should have in their head when coding.  The fact
  that all this information is all IMPLICIT is of course why software
  can be difficult to maintain, difficult to understand, or difficult
  to write in the first place.

- appreciating how design choices, coding conventions, and coding style
  can affect things: it can make code easier to understand and analyse,
  not only for tools like PREfast, but also for human programmers.

******************************************************************
Some generic feedback on the PREfast exercise.
******************************************************************

- Some people allowed *my_alloc to return a null pointer, namely in
  the case that the malloc() failed, and then had an _opt_ annotation
  on the return value, i.e.
    _Ret_opt_cap(size) char* my_alloc(...)
     
  In principle this is ok, but note that you then need _opt_
  annotations and/or null-checks in other places in the code.  Without
  having some null-checks elsewhere in the code. For instance,
  copy_data may then be called with a null parameter, or the call to
  memcpy in copy_data can crash or corrupt memeory when it is passed
  this null parameter.

  In general, it is simpler to exit() if a malloc() fails, so that a
  function like *my_alloc NEVER returns null, which makes life simpler
  in the rest of the code.  If you allow *my_alloc to return null and
  proceed with execution of the program, you may well have to exit() -
  or perform similar drastic action - at some later stage when you
  actually want to use this buffer that is now null, so you might well
  have exit-ed when the malloc failed.

- Note that in *my_alloc(), if size==0 then initialising ch[size-1] to
  null is an buffer overflow (or underflow), even if the malloc
  succeeded.  (If the malloc failed, it is clearly also a memory
  error.)

  PREfast does not warn about this issue; it seems to ignore
  interger over/underflow altogther.

  There are many ways to avoid this issue. You could exit my_alloc()
  if size==0, as you are clearly not meant to call my_alloc() with a
  zero argument. Alternatively, you could malloc(size+1) bytes,
  interpreting the size parameter as the size excluding the null
  terminator, thus making sure there is always room for the
  null-terminator; but of course size+1 can now produce an integer
  overflow, so you'd have to guard against that.

- Annotating the return value of copy_data as being tainted, eg with

   [returnvalue:SA_Post(Tainted=SA_Yes)] void copy_data(...) 

  does not make sense: this function is void so it does not return
  any value. Any tainting annotations for copy_data will have to be
  on the parameters of copy_data.

  Here there are many solutions here. Eg you could specify that

  a) the "output" buf2 will be untainted in the post-state if the
     "input" buf1 is untainted in the pre-state

  OR that

  b) the "output" buf2 will be tainted in the post-state if the "input"
     buf1 is tainted in the pre-state 

  The expressivity of SAL is limited.here You cannot cannot specify
  both property a) AND b) at the same time, even though both
  properties are true.

  Depending on your input validation strategy (ie. either validating
  data as soon as possible and only propagating untainted information,
  OR propagating tainted data and only validating when you feed
  tainted data to dangerous functions either annotation a) or b)
  will suffice to let PREfast check no tainted data is sent to
  execute(). 

- Disadvantages of dynamic analysis include
  - requires running code
  - additional overhead (in memory usage & execution time)
  - coverage is only as good as the test suite, meaning you can expect
    false negatives
  - error are caught later (at runtime) rather than sooner (at
    compile time:
      if you only spot a problem at runtime, it's too late to fix the
      code; if you find it at compile time, you can still fix the code
 
  Advantages of dynamic analysis include
  + warnings are always true positives - ie no (or at least fewer) false 
    positives)
     (as analysis is simpler if you know the actual values of
      any variables)
  + no need for annotations
    - though annotations can still be helpful to provide more
      feedback or more precise feedback
  
  Pros & cons of static analysis are exactly reversed.

  Not requiring running code means that it can catch
  problems earlier, at or before compile-time.

- Some people mentioned that an advantage of runtime analysis is that
  static analysis is not possible if it involves inputs whose values
  are unknown at compile time. That is not true: compile-time analysis
  _is_ possible even when dealing with unknown inputs, but it is
  harder, and it may ultimately be too complex, resulting in false
  positives or negatives.

  In general static analysis will is harder as it involves reasoning over
  arbitrary values, whereas checking properties at runtime is easeier
  because you know specific valuesFor example, at runtime you can
  simply check if a pointer is NULL or not, but a compile-time static
  analysis will have to consider both possibilities. Also, a
  compile-time analysis may require more annotations (to express
  assumptions on input values or to allow modular analysis of
  interactions between functions, and then keep the amount of false
  pos/neg in check.)
  
- Note that one could argue that the reason for PREfast not
  complaining about unannotated code is not so much preventing too
  many false positives, but more generally preventing too many
  warnings at all, be they false or true positives. Programmers will
  be turned off if a tool spews out too many complaints, even if many
  of these complaints are true positives...

******************************************************************
Pros & Cons of static vs dynamic analysis
******************************************************************
Dynamic

+ Better precision, as you have more info at runtime
+ No/fewer false positives
+ No need for annotations
  But: without annotations you may get less precise feedback about 
      the root cause of flaws

- Requires running code
- Can only be done later in SDLC, at runtime instead of compile-time
- Coverage only as food as test suite
   -> hence: more false negatives
- More overhead & slower execution
    because of runtime checks and because of extra admin at runtime
    (eg to record tainting information)

Static

+ Possible at or even before compile time
+ No overhead at runtime
+ Better coverage
   in theory perfect, but trade-offs with false positives & need for
   annotations
  -> hence: fewer false negatives, in principle...


- More false positives
  - even true positives might be disputed
- Need for annotations
   But: added bonus is then
      + more precise feedback about the root cause thanks to
        annotatons
      + these annotations are useful for human programmer too