CC BY-SA 3.0
Paris Métro, Station Saint Lazare (line 14) with Platform screen doors
When engineers were designing completely autonomously operated
Paris Métro line 14,
had to ensure the safety of tens of millions passengers each year, its 22 stations across Paris as
well as smooth running of trains. How to approach the development of software that allows for
the automation of public transit? The French team decided to use formal verification – learn
what it means!
We will go through an introduction to formally verified software,
note when formal verification gives most returns and present two introductory examples.
The first example will present a turnstile logic with a flip coin and the other one will be
a real, but still basic example of TLA+ methodology on the Die Hard with a Vengeance
problem with two jugs.
After reading the article you will gain an overview on fundamentals of Formally Verified
Software and will be aware of the method to avoid software bugs which can cause your business to
Why do systems need Formal Verification?
In engineering we have methods to validate
(i.e. “Are we trying to make the product meet the user’s requirements?”)
verify (i.e. “Does the product conform to the validated specification?”).
In the realm of computer hardware Formal
Verification is a pretty old concept and has been in existence since 1984 with
tools like Verilog and now superseded
by tools like SystemVerilog.
Such tools have become part of the IEEE specifications for designing and
However, in the realm of software engineering even though we have had
verification tools that have
existed since late the
1980s, it still has not been standardized nor is in widespread use in comparison to its
hardware counterpart. Even when we see the use of Formal Verification, we can see
that it is usually limited to specific fields like cryptography or communication
protocols and is frequently done by the academic circles. A good example of
implementing formal methods of verification is the automation of Paris Métro
line 14. With the ever
increasing complexity of software and the layers of abstraction, we have reached
a time when writing secure, efficient and resilient code requires some level of
formal verification to be done, if not for the whole software at least for the
important sub-systems involved. In recent times we have seen more
of formal verification by the industry leaders
like Intel, Amazon or Microsoft, in products where we
have enormous complexity and multiple systems interacting with one another.
What do we formally verify?
In the realm of software engineering, the first step of formal verification is asking the following questions:
- What do we intend to verify?
- How do we verify?
What do we intend to verify?
Setting out to verify the whole system and every line of written code, even though
theoretically possible, is a beginner’s mistake. The art of engineering is about being
smart enough to know which corners to cut and which ones not to cut. This is not
an easy task - it takes time, practice and experience to take decisions on
this. We should also keep in mind writing “verification” for a system is not the
same as writing tests (Unit / Functional) and fuzzing for it. Writing tests and
achieving 100% code coverage does not guarantee that the system works as
intended and it is only as good as tests themselves (after all, who tests the
unit tests?). Verification and testing can be considered to be two sides of the
same coin and one complements the other, completing the picture.
CC BY-SA 4.0
A screenshot from the Tamarin Prover security protocol verification tool.
Formal verification involves writing proofs on an abstract “mathematical model”
of the system. Depending on the design of the system, there are various options
available which can be used as the basis for writing the proof of
verification. Let us take the case of Tamarin
Prover – the tool used for symbolic
modeling and analysis of security protocols. Here abstractions happen at the
protocol level and agents that interact with the protocol. This abstraction works
well for security and cryptography related code. Our approach discussed in this
article would be a more fine grained approach, trying to verify finite state
machines and this is called model
How do we verify?
Finite state machines are the simplest mathematical models of computation where
we represent states and the rules to transition from one state to another.
As an illustrative example let us take a closer look into the
working of a coin operated
CC BY-SA 4.0
Author: Sebasgui Changes that have been made to the original work: removed white background and resized.
- We have a coin operated turnstile which is either in a “closed” state or “open” state.
- We have a couple of operations that we can do on such a turnstile
- We can “push” it.
- We can “insert coin” into it.
- Applying the operations on a given state allows you transition between states
or keeps you in the same state depending on which operation is applied to
The above description can be represented as a diagram and this basically
constitutes the representation of a finite state machine.
Our aim is to verify that given this state machine, the turnstile works as
expected and more importantly, it does not enter an unexpected state. For example,
we do not want the turnstile to go “unlocked” without a coin or remain
“unlocked” after a push has been made.
A very simple way of thinking about this is to consider the state machine in a
linear timeline and the actions and transitions in this linear timeline.
Our reasoning on the working of the machine and the states it takes up is not
- The turnstile is not guaranteed to be “open” at all points in time (or
“closed” for that matter)
- The statement “The turnstile is open” has truth values that would vary with
All this forms the basis of temporal
NOTE: One of the basic things to keep at the back of the
mind would be that time continues to move forward in this universe even if no
further changes occur in the state machine.
So by verifying the behavior of the state machine through time one can in turn
verify the behavior of the system and its implementation. Of course things do
get complicated when these state machines are distributed over a network and
things happen in a concurrent system.
The TLA+ logo, seen as the splash screen when the TLA+ toolbox is started. © Edward Lamport.
Let’s take a look at a very specific tool called
TLA+, which can be used to achieve state
machine verification for concurrent systems.
TLA+ has been around since 1999, and is developed by Leslie
Lamport, a Turing award winner.
In addition to writing
formal specification, it can also be used to design, model, document and verify
programs, especially concurrent systems and distributed systems. This is a good
toolkit to have since many of the systems level applications and blockchain
applications tend to have a combination of distributed and concurrent systems
TLA+ has been used to write up systems level proofs for things like Memory
Cache coherence protocols to
distributed consensus protocols like
Raft. In addition to
this, TLA+ specification is also LaTeX compatible making for an excellent way to
generate documentation of the proofs.
A simple TLA+ example
Unfortunately, TLA+ does not come with a run off the mill “Hello, World”, like
most other programming languages. And we have to take a slightly bigger example
to understand the basics of TLA+. We shall choose one of the basic introductory
examples given in the TLA+ tutorials called Die
The name of the problem comes from the movie “Die Hard with a Vengeance”, where
the two heroes of the movie have to stop a bomb from exploding by placing 4
gallons of water in a jug. The twist being that the heroes are provided only with a
3-gallon jug, 5-gallon jug and a water faucet.
We solve this problem by starting to fill 5-gallon jug. This is the
approach taken in the movie.
- Make sure both the jugs are empty at the start.
- Fill the 5-gallon jug.
- Empty the 5-gallon jug into the 3-gallon jug.
- We have 2 gallons of space in the 5-gallon jug.
- Empty 3-gallon jug on to the ground.
- Pour the remaining 2 gallons in the 5-gallon jug into the 3-gallon jug. (1
gallon space remaining in 3-gallon jug)
- Fill the 5-gallon jug.
- Pour 1 gallon out from the 5-gallon jug into the remaining space in the
By the end of this manual process we should have exactly 4 gallons of water in
the 5-gallon jug.
There is an alternative solution to this problem, by starting to fill the 3-gallon jug.
If can find it out, leave your solution in a comment.
The manual description of the solution happens in discrete steps, like following
states in a finite state machine. Each state is well defined and has a path to
the next state.
Now we are going to take a look at the solution from a TLA+ perspective.
We are not going to do any fancy stuff with TLA+ for now. But we will be
exploring the basic features of TLA+ and how it works, namely making it step
through a Finite state machine.
NOTE::TLA+ uses a breadth-first search
approach when going through a state machine
Let us have a look at the TLA+ description of the Die
- Variables - These describe specification of the variables (jugs) in our problem.
VARIABLES big, \* The number of gallons of water in the 5-gallon jug.
small \* The number of gallons of water in the 3-gallon jug.
- Invariants - These describe the invariants within our system, in our case the
values for the 3-gallon jug has to be between 0 and 3 (inclusive) and likewise
for the 5-gallon jug this has to be between 0 and 5 (inclusive). The invariants
keep a sanity check on the operations we do on the jug. This helps to keep
having operations that may result in negative values in our jugs or have
overflow conditions like having 4 gallons in the 3-gallon jug.
TypeOK == /\ small \in 0..3
/\ big \in 0..5
- Initial state - These define the entry point to state machine, and we make
sure both jugs are at 0 gallons at start.
Init == /\ big = 0
/\ small = 0
Actions - These define the various actions that are possible from a given
state in the machine that helps it to advance to the next state within the
state machine. We list the various actions possible below, actions are defined
independently for the 3-gallon jug and the 5-gallon jug.
FillSmallJug == /\ small' = 3
/\ big' = big
FillBigJug == /\ big' = 5
/\ small' = small
EmptySmallJug == /\ small' = 0
/\ big' = big
EmptyBigJug == /\ big' = 0
/\ small' = small
- Transferring the water (3 to 5 and 5 to 3)
SmallToBig == /\ big' = Min(big + small, 5)
/\ small' = small - (big' - big)
BigToSmall == /\ small' = Min(big + small, 3)
/\ big' = big - (small' - small)
You will also notice that, for each action, we also make sure the next state
of the unused jug (for example when filling or emptying), is mentioned
not to change. This is important when specifying actions to make sure that we
mention what has changed in that state and what has not changed so that the
system has no undefined state for any of the jugs at any given point in time.
- Next state - These define all possible next states in our system using both
“disjunction” (OR) and “conjunction” (AND). In the Die Hard example all
actions are possible independently of each other. Hence we use “disjunction”
to describe the “Next” state from our initial state.
Next == \/ FillSmallJug
- Specification - Finally we define our specification to be a conjunction of our
Initial state and Next state.
Spec == Init /\ [Next]_<<big, small>>
Finding the solution
We need 4 gallons of water to prevent the bomb from
exploding! This invariant will make sure all the states not leading to the 5-gallon jug
do not have 4 gallons as errors since they are not a part of the solution we seek.
For this we have a predicate called
NotSolved asserting that
big is not
equal to 4.
NotSolved == big # 4
With the above description we have managed to describe the manual process of
finding the solution to a state machine that can be run within TLA+ with well
defined actions on how to reach each of the states.
The TLA+ main window presented above shows the loaded program
The result of the model checker shows the various reachable states in our
We have also set the check on the
TypeOK invariant to make sure
the formula holds true for every reachable state. The key to finding the
solution is the
big /= 4 invariant. This way we make the model checker show us
a behavior ending with the 5-gallon jug nor equal 4 is false.
The error trace shows the case (the last entry) where the invariant becomes
false and our desired solution to the Die Hard Problem.
Each of the steps in the error trace shows how the jugs have to be filled so
that we end up with the 4 gallons of water in the 5-gallon jug.
Hence running the simulation within TLA+ has helped our heroes to come to this
solution automatically, by just defining the model in the formal language.
The bomb has been disarmed on time!
We have demonstrated how we can use TLA+ to play around with states of a
finite state machine. However, we still have not actually stepped into the powerful tools
available in TLA+ like emulating passing around messages, or declaring a theorem
for our proof.
In the next part we shall explore the details of using TLA+ to write a
verification for a specification from a description written in an RFC.
Formal Verification is gaining importance
with the ongoing trend to automate further aspects of our life.
The Paris Métro line 14 mentioned above uses another way of formal
verification called “B-Method”, which uses
abstract state machines,
set theory and
first-order predicate logic to help define systems.
This approach is different
from TLA+ as it has a tight loop between abstract machine and implementation,
and uses refinement technique to match the abstract machine with the
The approach of formally verified software in the Paris Métro gave significant returns to the city
and motivated the switch to autonomous orchestration of another Paris Métro Line 1.