Why I use TLA+ and not(TLA+): Episode 1

Author: Igor Konnov

Recently, we have seen several interesting write-ups in the TLA⁺ ecosystem:

Leslie Lamport posted the document on The Future of TLA+.
Andrew Helwer wrote the blogpost called TLA+ is more than a DSL for breadth-first search.
prestonph asked about Opinions on Quint.

These three posts made me think about the TLA⁺-related work I have been doing since 2016. There were numerous discussions on Hacker News, but those seem to have saturated to saying that there are also Lean, Coq, and dependent types. With this blogpost, I would like to summarize my experience with TLA⁺, TLC, Apalache, and Quint. I share these learnings in the hope to spark new ideas about improving the tooling for TLA⁺.

Disclaimer: All opinions are of my own. I resigned from Informal Systems on December 31, 2023 and have not been receiving any funding from them since then. Even though I still have some equity there, not knowing its current value, I do not feel financially motivated to promote their work or products. I am fixing small issues in Quint, when my customers ask me to do that.

1. Why I am using TLA+

I am using TLA⁺ in the new projects for fun and profit, mainly, by running the Apalache model checker. However, it is up to the customer to decide whether they want to use TLA⁺ or another syntax like Quint, or something else. So far, I have not found a replacement for TLA⁺ that would allow me to easily switch between different computational models and abstractions as easily, as it can be done in TLA⁺.

TLA⁺ is consistent with my prior knowledge on model checking. Before I started to learn TLA⁺, I spent some time learning model checking algorithms and writing a (very much) domain-specific model checker ByMC. So I opened Specifying Systems having a lot of baggage on transition systems, Kripke structures, linear-temporal logic, computational-tree logic, explicit-state model checking, symbolic model checking with BDDs and SAT/SMT, abstractions and simulations, etc. Many of these topics are covered in the Handbook of Model Checking.

As a result, I found it quite easy to grasp the concepts presented in the book. I am not a historian, but this is probably because Leslie Lamport, Amir Pnueli, Edmund Clarke, and many other researchers were actively working on these topics and integrating their ideas in distributed algorithms and computer-aided verification for long time.

So this is it. TLA⁺ is pretty much what I expected from a general specification language. It was definitely much easier for me to start using TLA⁺ for practical purposes than, e.g., learning Coq or Isabelle.

TLA⁺ has well-defined semantics. It may come as a surprise to some readers, but many programming languages still do not have formal semantics. By formal semantics I do not mean formal English, but a set of consistent mathematical definitions. Usually, someone defines formal semantics for a subset of the programming language, and then they give up, because the general case is too hard. If you are writing your own tool such as a model checker, this is quite annoying, as you have to somehow define semantics in your tool, usually, using your own judgment. Easy win for TLA⁺ over programming languages.

TLA⁺ offers a convenient set of primitives. We don’t have to reinvent everything from scratch. Additionally, it provides a practical set of primitives. Few engineers and protocol designers are interested in expressing their thoughts in Peano arithmetic, effectively-propositional logic, or Petri nets.

The logic of TLA⁺ is extremely flexible. It is very easy to switch between different levels of abstraction. This is extremely important when modeling distributed systems. This is probably why some people keep talking about refinement.

I can easily express concurrent algorithms, smart contracts, cross-blockchain protocols, and fault-tolerant consensus algorithms. Sure, it requires more effort than using a domain-specific language. However, when somebody comes to me with a new consensus algorithm or a conglomerate of smart contracts, like ChonkyBFT and ZKsync governance by Matter Labs, I know that it should be possible to express it in the logic of TLA⁺. At the same time, expressing distributed systems in TLA⁺ is not as difficult as it is in a verification language designed for sequential algorithms. It’s a bit of magic.

This is in contrast to verification tools that are tuned towards a specific programming language. There, anything above the core abstractions requires plenty of hacks.

TLA⁺ has a good level of automated analysis. I am mainly using the model checkers Apalache and TLC (unfortunately, TLC more often does not scale to my problems than it does), but there is also the proof system TLAPS. When people ask me about Lean and Coq, it’s kind of interesting, but I am having hard time explaining a computer why a list reversal algorithm reverses a list, or why two quorum sets have at least one element in common. I would rather like computers to disprove my hypotheses.

2. Lessons from Informal Systems and the Cosmos blockchains

We were actively using TLA⁺ and Apalache in 2020-2022. As a result, we wrote specifications of Tendermint, its light client, and IBC, see Tendermint TLA+ Spec, Light client TLA+ Spec, IBC TLA+ Specs. For more details, see Informal Q2 2020 Update. We used both TLC and Apalache. I will only highlight several improvements to Apalache, even though there were a lot of other exciting developments.

Type checker. Back then, Apalache had a lot of usability issues. For instance, its type checker was very fragile and hard to use. In the first version, we were writing type annotations as TLA⁺ expressions. I could not imagine that people would get so creative and start using operator definitions in the annotation expressions. We completely rewrote the type checker in 2021 and further improved it in 2022 by introducing precise type inference for records, see ADR-002. The type checker was essential for translating TLA⁺ to SMT, as was laid out in the OOPSLA’19 paper.

Here is a code snippet that demonstrates type annotations for constants and variables in a simple labyrinth example:

CONSTANT
    \* The maximal x-coordinate.
    \* @type: Int;
    MAX_X,
    \* The maximal y-coordinate.
    \* @type: Int;
    MAX_Y,
    \* The set of walls.
    \* @type: Set(<<Int, Int>>);
    WALLS,
    \* The goal coordinates.
    \* @type: <<Int, Int>>;
    GOAL

VARIABLES
    \* @type: Int;
    x,
    \* @type: Int;
    y

The type checker requires type annotations for constants and variables. Given those, it tries to infer types for everything else using a modified version of the type inference algorithm by Damas and Milner. In some cases, type inference cannot distinguish between functions, sequences, tuples, and records. In those cases, the type checker requires additional type annotations.

Randomized symbolic execution. At some point, we started to check properties of the specifications that were too hard for bounded model checking. Of course, one approach to the issue was to raise the level of abstraction. However, it was not always possible without losing the engineers. Hence, we have introduced the command apalache-mc simulate that randomly picks symbolic transitions instead of non-deterministically choosing from the set of all enabled transitions. This command is quite efficient for finding bugs, though it sacrifices completeness. I will write a separate blog post on comparing check vs. simulate. The command name may be misleading; it does random+symbolic (randolic?) execution.

As a teaser, these are statistics from finding an agreement violation with apalache-mc check on Ben-Or’s consensus for the case of too many faults:

  Time (mean ± σ):     96.277 s ± 20.296 s
  Range (min … max):   68.131 s … 136.544 s    10 runs

And these are statistics for the same specification and the same invariant using apalache-mc simulate:

  Time (mean ± σ):     163.336 s ± 184.750 s
  Range (min … max):   14.317 s … 609.343 s    10 runs

As we see, this random+symbolic execution is not the great on average. However, there are cases, where it finds a counterexample way faster than bounded model checking. Especially, when we run this search on multiple CPU cores, it is finding counterexamples much faster than I expected.

Fold/reduce instead of recursion. Recursive operators were introduced in TLA+ Version 2, which appeared after Specifying Systems. For example, set cardinality (for finite sets) can be defined with a recursive operator:

RECURSIVE CardinalityRec(_)
CardinalityRec(S) ≜
  IF S = {}
  THEN 0
  ELSE 1 + CardinalityRec(S \ { CHOOSE y ∈ S })

Unfortunately, recursion (and loops) are a pain point of bounded model checking. First, a recursive operator does not have to terminate. Second, even if it does terminate, it is impossible to predict the number of its iterations in the general case. Obviously, the above operator has $|S|$ iterations. Fortunately, many programming languages support bounded iteration called reduce or fold, see the Fold page on Wikipedia. Even Java has reduce since version 8! We refactored Apalache to work with folds instead of recursive operators:

EXTENDS Apalache

CardinalityFold(S) ≜
  LET Count(n, i) ≜ n + 1 IN
  ApaFoldSet(Count, 0, S)

More on that in Folds in Apalache. A similar operator FoldSet was introduced in TLA⁺ community modules. Apalache supports this operator as well.

Improved stability. Perhaps, the most interesting observation for me was that in 2021 we were finding issues in Apalache, whenever we were writing a new specification. We stopped finding new issues in 2022. People are still submitting issues every now and then, but the current implementation is significantly much more stable.

Conversations with engineers. Thanks to all that work, I had plenty of conversations with engineers at Informal Systems as well as in the more global Interchain/Cosmos ecosystem.

There were four recurrent themes in these conversations:

Every time I was showing a TLA⁺ specification to an engineer, they were asking about /\, \/, \E, \A, =>, and other operators. Back then, the Unicode support in TLA⁺ was not even a thing. When I was explaining the meaning of these operators, everything was clear. However, we were losing time in a meeting with every new person. We could use this time to discuss the specification itself. Instead, we were discussing the syntax of TLA⁺. There was no single engineer who said that they liked this part. Interestingly, these people did not want to write a specification, they just wanted to read it.
An engineer would get excited about TLA⁺ and literally write a program in every single detail, following good programming practices, but completely overdoing it. We all have seen that and all have done that. Surprisingly, the mantra “TLA⁺ is not a programming language” did not stop them. They just treated TLA⁺ as a programming language with strange syntax. There are plenty of languages with strange syntaxes around. If they liked imperative languages, they wanted assignments are returns everywhere. If they liked functional languages, they wanted to wrap everything into Either and Option. The Rust engineers… they wanted to do both of these things.
Perhaps, related to the previous point, everybody was asking whether it was possible to translate Rust, Golang, TypeScript, Python, whatever to TLA⁺, or the other way around. Every time, I had to explain that, yes, to some extent, it should be possible, but the outcome would be completely unusable in the both directions. People still keep asking these questions.
Finally, every engineer wanted to connect their implementations to TLA⁺ specifications. To this end, we introduced machine-readable output of traces in the JSON format. Moreover, Andrey Kuprianov’s team has developed two tools for model-based testing: Modelator and Atomkraft.

3. Conceptual and mental models

On the surface, it looked like people were only asking about the syntax, but it was something deeper. I think I started understanding it a bit better after reading The Design of Everyday Things by Don Norman. Here are just two sentences from the book that introduce conceptual models (p. 25):

A conceptual model is an explanation, usually highly simplified, of how something works. It doesn’t have to be complete or even accurate as long as it is useful.

When you buy a computer, nobody gives you a book that starts with: “Welcome to the magical world of transistors!” Or, when you buy a fridge, nobody explains you electricity or the Carnot cycle. I am afraid we are doing something like that all the time, when we try to explain TLA⁺ to newbies. To be fair, Coq tutorials were also like that.

What are conceptual models in the world of TLA⁺? The canonical conceptual models are given in the book on Specifying Systems and The TLA+ Video Course by Leslie Lamport. Hillel Wayne also presents another conceptual model in his Learn TLA+ – though it is more focused on PlusCal – but exercises there are oriented towards another concept from the design book. When the readers do exercises they start building their own mental models (p. 26):

Mental models, as the name implies, are the conceptual models in people’s minds that represent their understanding of how things work. Different people may hold different mental models of the same item, each dealing with a different aspect of its operation: the models can even be in conflict.

I believe that these two concepts explain a lot. They explain why different people like different aspects of TLA⁺: Like with a good book, we interpret the message in our own way, building mental models of our own. Moreover, as Andrew Helwer noticed in TLA+ is more than a DSL for breadth-first search, many users of TLC believe that TLA⁺ and TLC are exactly the same thing. The explanation is very simple (not really a quote, just using the style):

The fastest way to build a solid mental model of TLA⁺ is by running TLC, unless you already know math and logic very well.

This is not really surprising. We all learn new topics by multiple iterations and practice. I do not really know how people learn programming languages these days. I still prefer reading a book, but I suspect that many people learn new programming languages by interaction. For instance, it was possible for me to start reading and writing Golang after doing A Tour of Go, though my code was probably far from perfect.

I believe that every time people complained about the syntax and the tools, they actually complained about the lack of a fast feedback loop, so they could keep learning. With a programming language, you can just write some code and execute it. However, “executable” is a taboo word in the TLA⁺ community for some reason, despite a large fragment of TLA⁺ over finite sets being executable, even if the complexity of this execution is not great. This is actually the reason for why TLC exists at all. The closest thing to such a feedback loop is actually TLC. There is also TLC REPL, but it is probably not that well-known.

This also probably explains why so few people manage to pick up symbolic tools (the stability issues aside). It is much harder to build a mental model there, and there are not so many conceptual models around. On this topic, I have a new idea on how to build a mental model of symbolic model checking for TLA⁺. Subscribe to the blog below

To be continued

As always, this text became too long. You will find the rest of the story in Part 2.