- Concerns what a program does: analysis aim is to check statically/dynamically something about execution behaviours
- Is flow-sensitive: what the analysis does with a statement is dependent on control flow of the program
Software analysis falls into many categories
- Metaproperty analysis: anything else except for what the code actually does (e.g. style checkers)
- Program analysis: analysing what a (part of a) program will do / does / can do
- Static program analysis: without running the program. This requires considering all executions of the code of interest which requires context about the execution up to this point
- Value-agnostic
- Value-sensitive (symbolic execution)
- Dynamic program analysis: analyzes the program while running it
- Automatic (Concolic) test-case generation
- Static program analysis: without running the program. This requires considering all executions of the code of interest which requires context about the execution up to this point
3 main properties about code
- How the code is written (fairly simple static meta-property analysis)
- What the code does (program analysis - cannot be done precisely)
- What the author intended (impossible for fully automatic analysis, not available from the code)
See: Rice’s Theorem
Designing a Static Analysis
- Define the goal: what is the property (of all executions) of a program? What problem is the analysis supposed to help?
- Abstract states : type of information analysis tracks through program
- Error/output information : type of information returned by analysis
- Analysis function: define a function for analysis steps where
- is an abstract state
- is a program state
- returns a pair of resulting abstract state plus any errors
- typically defined per-case of type of (supported) statement s
- Concretisation function: maps abstract states to sets of program states or sets of program executions - this defines what abstract states mean. What does the analysis “think” is a possible state at this point?
- (D, A) maps to set of all states in which at least D are declared and A are initialized
- (D, A) maps to set of all states in which at most D are declared and A are initialized
- (D, A) maps to set of all states in which exactly D are declared and A are initialized
- (Optionally) termination strategy: for recursive control flow (e.g. loops)
We can do this by instrumenting (changing) the program. We have two places where this can happen
- Instrumenting source code (e.g. grab AST and modify it)
- Instrumenting executable (e.g. grab bytecode and modify it)
- Instrumenting runtime (e.g. modify JVM)
There’s also a choice to be made about how/when to perform the analysis
- Online dynamic analysis: run the analysis as part of / alongside the program
- Offline dynamic analysis: make the program produce a log; analyse it separately
Program Slicing Example
NB: We never delete variable declarations, just their initializations. Similarly, we keep flow constructs if they have at least one line inside of them.
Static
- Define abstract state as where
- is a map from variable names to what line numbers may have affect that variable at that point
- is a list of control flow dependencies at that point
- For assignment of
x=e
at line wheree
is an expression that can contain multiple variablesM[x]
becomes the union of- The line number
n
M[y]
for everyy
in the expressione
- Unions of all sets in
L
- The line number
L
remains unchanged
- For if-then-else statements
- Push to
L
: the union ofM[y]
for everyy
in the if-condition check - Copy the map
M
to start of the then and else blocks - Continue normally…
- At the end of each block, union the
M
s at the end of each block - Pop from
L
- Push to
- For loops
- Add loop variable to
M
- Push to
L
: the union ofM[y]
for every y in the loop condition - Continue normally…
- When we hit end of loop body, we go back to start of the loop and rewrite. Stop when neither
M
norL
update- Each variable in
M
to be union of its old value and the value at the end of the loop - Head of
L
is the union of itself andM[y]
for every y in the loop condition
- Each variable in
- Pop from
L
- Add loop variable to
Dynamic
- We add two extra objects to the program state,
m
andl
which are functionally equivalent to and from previously - For assignment of
x=e
at line , we updatem
to storem[y]
for eachy
ine
l
flattened
- For if-then-else statements
- Before the statement, we calculate
s
by taking union of all the dependencies in the condition and push this tol
- We pop
l
after the statement
- Before the statement, we calculate
- Loops are the same as if-then-else statements but we do steps 1. and 2. inside the loop instead of outside the statement