Thomas Dullien’s (@halvarflake) recent paper “Weird machines, exploitability, and provable unexploitability” struck a chord with me, albeit just slightly out of tune. Halvar’s formulation is very close to how my own thinking on this subject has been developing since being introduced to LangSec and the notion of Weird Machines by Sergey Bratus (@sergeybratus). This opportunity to jump on Halvar’s coattails is just too good to pass up, so here it goes…
Borrowing the notion of “intended” machine, I would like to extend it to Intended Turing Machine (ITM). I am not ready to accept Halvar’s reduction of all real computers to very large Finite State Automata. While technically correct, the extremely large number of states and transitions makes the reduction impractical for most real world applications, at least in my finite mind.
Instead of defining the ITM directly, we define it in terms of the recursively-enumerable language it accepts. Let’s call it the Intended Language (IL). IL is a language of intended execution traces.
As the real program runs, it generates a sequence of executed instructions, along with their actual arguments. IL consists of all possible execution trace strings that can be produced by intended execution of the program.
Any execution trace that is produced by the actual program that is not in IL, is defined as an exploit.
Avoiding exploitation amounts to constraining our program to only produce execution traces that are in IL. We set aside (at least for now) the problem of building a program that cannot be exploited, and instead look at the problem of detecting exploitation.
We introduce the notion of a Validator — a machine that runs on the execution trace (either after or right before actual execution happens) and attempts to distinguish intended operation from exploitation.
If we could build a validator that is equivalent to the ITM, that could accurately accept all and only the strings in IL, we could reliably distinguish intended execution from exploitation. Of course, if we could reliably build an equivalent of the ITM, then we wouldn’t have the problem of exploitation in the first place. Generating a perfectly accurate validator that accepts all and only the string in the ITL, is as hard as writing the correct program in the first place (see wave of hand in lieu of proof).
So we are left with only imperfect validators. Any practical validator for a sufficiently complex program will either reject strings that are in IL and/or accept those that are not. We use VL to designate the language of strings accepted by our validator.
Any string in IL that is not in VL is a false positive — a valid intended execution blocked by our validator.
Any string in VL that is not in IL is a false negative — an exploit that escapes undetected by our validator.
Although a perfect validator may not be possible, we can (and do) build imperfect validators in the form of anti-exploitation countermeasures.
In fact, the common types of countermeasures can be described in terms of validators of reduced complexity — either Finite State Automata (FSA) or Push-Down Automata (PDA).
Data Execution Prevention (DEP) can be recognized as very week Finite State Automata. It has the effect of fixing the alphabet and constraining state transitions defined within basic blocks and direct jumps of our program.
Control Flow Integrity (CFI) strengthens the constraints on state transitions. But, without a secure stack it remains an FSA.
Somewhat counter-intuitively, one of the simplest countermeasures is the one that adds greatest power to a validator. Adding a stack to the validator (eg. shadow stack) elevates it to a Push-Down Automaton. Although this results in a significant increase in theoretical computational capability of the validator, gaining the power to recognize Context Free languages, it still falls far short of being able to recognize the recursively-enumerable IL.
From this view of countermeasures it is apparent that none of them are (or ever can become) perfect.
As any practical validator will be imperfect, it will have to be either too strict or too loose. Too strict is typically not an option, as it will unnecessarily restrict valid intended functionality. So we are left with validators that are too loose and still allow some exploits to slip through undetected.
Finding/constructing an exploit can be reduced to finding the inputs that cause the target program to produce an execution trace S that satisfies the following properties:
- S is not in IL (definition of exploit)
- S is in VL (bypassing validator)
I suspect that finding a string that is in VL but not in IL is straightforward in most cases. A more difficult task is finding the inputs that cause the target program to generate S.
All proofs are left as exercises to the reader. The whole subject deserves a more diligent treatment than I can offer. I hope Halvar’s publication attracts further analysis and peer review to this domain.
Undeveloped ideas for further thought:
- Computer architectures with explicit support for validators as first-class entities.
- Self-contained module validators
- Tune-able validators that can be constrained or relaxed to trade off functionality for increased security or vice versa.