Techniques Used by the Certora Prover ===================================== In this chapter, we describe some of the techniques used inside the Certora Prover. While this knowledge is not essential for using the Prover, it can sometimes be helpful when the Prover does not behave as expected, for instance in case of a timeout. (control-flow-splitting)= Control flow splitting ---------------------- ```{note} In addition to the text-form documentation below, there is a brief explanation of control flow splitting in the [webinar on timeouts](https://www.youtube.com/watch?v=mntP0_EN-ZQ). ``` Control flow splitting (or "splitting" for short) is one of the techniques that the Certora Prover employs to speed up solving. In the remainder of this section, we will give an overview of how the technique works. This background should be helpful when using the settings described [here](control-flow-splitting-options) to prevent Prover timeouts. ### Idea We explain the core idea behind control flow splitting on a simple example. Whenever there is branching in a program we want to verify, we can look for counterexamples on each branch separately. Basically we split the question A: "Is there a violating execution in the program?" into the two questions B: "Is there a violating execution in the program that takes the first branch?", and C: "Is there a violating execution in the program that takes the second branch?". If the answer to either B or C is "yes", then we can conclude that the answer to A must be "yes". If the answers to B and C are both "no", then we can conclude that the answer to A must be "no". For example, consider a rule with an `if` statement: ```cvl rule example { ... if (owner == spender) { assert balance_after == balance_before; } else { assert balance_after == balance_before + amount; } } ``` To simplify the search for a counterexample, the Prover may internally split this single rule into two rules: ```cvl rule example_split_1 { ... require owner == spender; assert balance_after == balance_before; } rule example_split_2 { ... require owner != spender; assert balance_after == balance_before + amount; } ``` A counterexample for either of the split rules will also be a counterexample for the original rule, and any counterexample for the original rule must violate one of the two split rules, so this splitting doesn't change the meaning of the rule. However, in some cases the split rule is easier for the Prover to reason about. ### Technical Description On a technical level, splitting is best illustrated using the {term}`control flow graph` (CFG) of a given CVL rule. A single splitting step proceeds as follows: - Pick a node with two successors in the CFG, the *split point*. - Generate two new CFGs, we call them *splits*; both splits are copies of the original CFG, except that in the first (second) split, the edge to the first (second) successor has been removed. The algorithm also removes all nodes and edges that become unreachable through the removal of the edge. ```{figure} split-step.png :name: single_split() :alt: A single splitting step :align: center :height: 300px Illustration of a single splitting step ``` There is an internal heuristic deciding which branching nodes to pick for each single splitting step. The following pseudo-code illustrates how Certora Prover applies the single splitting in a recursive fashion. ```{note} In the remainder of this subsection, we'll use the terms {term}`SAT` and {term}`UNSAT`. SAT denotes the presence of a {term}`counterexample` (if the rule has an `assert` statement) or a {term}`witness example` (if the rule has a `satisify` statement). UNSAT denotes the absence of any counter- or witness examples. ``` ```{code-block} :name: recursive splitting algorithm :caption: "Recursive splitting algorithm as pseudo code" Input: input_program_cfg worklist = [] worklist.add([input_program_cfg, 0]) while (worklist != []) [cfg, current_depth] = worklist.pop() res = smt_check(cfg, get_timeout_for(current_depth)) when (res) [SAT, model] -> return [SAT, model] UNSAT -> continue TIMEOUT -> if (current_depth == max_depth) return timeout else [split_1, split_2] = split_single(cfg) worklist.add([split_1, current_depth + 1]) worklist.add([split_2, current_depth + 1]) return UNSAT ``` Intuitively, the algorithm explores the tree of all possible recursive splittings along a fixed sequence of split points up to the maximum splitting depth. We call the splits at maximum splitting depth *split leaves*. The exploration stops in any of the following three cases: - if one split was found that is SAT (reasoning: if one split is SAT, then the original program must be SAT, since the behavior of the split is replayable in the original program) - if all splits have been shown to be UNSAT - if solving on a split leaf has timed out (except if {ref}`-dontStopAtFirstSplitTimeout` has been set) The settings with which the user can influence this process are the following (each links to a more detailed description of the option): - {ref}`-depth` controls the maximum splitting depth. - {ref}`-mediumTimeout` controls the timeout that is applied when checking splits that are not split leafs, i.e., that are not at the maximum depth. - {ref}`--smt_timeout` controls the timeout that is used to solve split leafs; if this is exceeded, the Prover will give up with a TIMEOUT result, unless [the corresponding setting](-dontStopAtFirstSplitTimeout) says to go on. - Setting {ref}`-smt_initialSplitDepth` to a value above 0 will make the Prover skip the checking and immediately enumerate all splits up to that depth. (storage-and-memory-analysis)= ## Analysis of EVM storage and EVM memory The Certora Prover works on {term}`EVM` bytecode as its input. To the bytecode, the address space of both {term}`EVM storage` and {term}`EVM memory` are flat number lines. That two contract fields `x` and `y` don't share the same memory is an arithmetic property. With more complex data structures like mappings, arrays, and structs, this means that every ["non-aliasing"](https://en.wikipedia.org/wiki/Aliasing_(computing)) argument requires reasoning about multiplications, additions, and hash functions. The Certora Prover models this reasoning correctly, but this naive low-level modeling can quickly overwhelm SMT solvers. In order to handle storage efficiently, the Certora Prover analyzes Storage (Memory) accesses in EVM code in order to understand the Storage (Memory) layout, thus making information like "an update to mapping `x` will never overwrite the scalar variable `y`" much more obvious to the SMT solvers. For scaling SMT solving to larger programs, these simplifications are essential.