Small scope hypothesis revisited

Tags: specification tlaplus tlc

A couple of weeks ago, I gave a talk at the internal Nvidia FM Week 2025. Many thanks to Markus Kuppe for the organization and invitation! I am going to write a longer blog post about interactive spec conformance testing with Apalache later. Today, I want to talk a bit about the question posed by Markus (to find the question, continue reading).

Let’s talk about the small scope hypothesis. As formulated by Jackson in the technical report (1997), this hypothesis reads as follows:

"...most errors can be demonstrated by counterexamples within a small scope."

As you will see below, my example fits into this hypothesis quite well. However, having spoken to many engineers over the years, I believe that there is a mismatch between what engineers understand by “small scope” and what verification engineers understand by “small scope”.

In this blog post, I’ve decided to try a new format. Since everyone is using LLMs nowadays, I will follow the protocol. I will present the example and the problem of finding a small scope. Then, it is your turn to decide how this blog post should continue. If someone gives me an interesting example or insight in a comment, I will update this blog post accordingly.

1. Example 1: Buggy circular buffer

1.1. The specification

I started the talk with a TLA⁺ specification of a buggy circular buffer. You can find the full specification, the model checking models, and the TLC configuration files here. The specification looks as follows:

----------------------------- MODULE BuggyCircularBuffer -----------------------------
(**
 * A very simple specification of a circular buffer with a bug.
 * Generated with ChatGPT and beautified by Igor Konnov, 2025.
 * ChatGPT learned abstraction so well that it omitted the actual buffer storage!
 *)
EXTENDS Integers

CONSTANTS
    \* Size of the circular buffer.
    \* @type: Int;
    BUFFER_SIZE,
    \* The set of possible buffer elements.
    \* @type: Set(Int);
    BUFFER_ELEMS

ASSUME BUFFER_SIZE > 0

VARIABLES
    \* The integer buffer of size BUFFER_SIZE.
    \* @type: Int -> Int;
    buffer,
    \* Index of the next element to POP.
    \* @type: Int;
    head,
    \* Index of the next free slot for PUSH.
    \* @type: Int;
    tail,
    \* Number of elements currently stored.
    \* @type: Int;
    count

\* Initial state
Init ==
  /\ buffer = [ i \in 0..(BUFFER_SIZE - 1) |-> 0 ]
  /\ head = 0
  /\ tail = 0
  /\ count = 0

\* Buggy PUT: Advance tail, increment count, but no fullness check!
Put(x) ==
  Put::
  LET nextTail == (tail + 1) % BUFFER_SIZE IN
  /\ buffer' = [buffer EXCEPT ![tail] = x]
  /\ head' = head
  /\ tail' = nextTail
  /\ count' = count + 1

\* GET: Only allowed when count > 0.
Get ==
  Get::
  LET nextHead == (head + 1) % BUFFER_SIZE IN
  /\ count > 0
  /\ UNCHANGED buffer
  /\ head' = nextHead
  /\ tail' = tail
  /\ count' = count - 1

\* Either Put or Get may happen in any step.
Next ==
    \/ \E x \in BUFFER_ELEMS:
        Put(x)
    \/ Get

vars == <<buffer, head, tail, count>>

\* Complete specification
Spec == Init /\ [][Next]_vars

\* Safety property we *intend* to hold, but it is violated:
\* count must never exceed the buffer capacity.
SafeInv == count <= BUFFER_SIZE

Since I wanted to experiment with different buffer sizes and potential buffer elements, I have introduced two parameters in the specification:

BUFFER_SIZE is the size of the cyclic buffer, and
BUFFER_ELEMS is the set of possible buffer elements.

Now, my previous experience with introducing TLA⁺ to engineers suggests that there are two ways to set these parameters:

The Engineer’s way: Set the parameters to relatively small yet reasonable values. For example, BUFFER_SIZE = 10 and BUFFER_ELEMS = 0..255. These are not the minimal possible values, but they kind of make sense: The buffer should hold up to 10 bytes. Obviously, BUFFER_ELEMS are set to the minimal possible type in their programming language of choice, e.g., char in C, or u8 in Rust.
The Verification Engineer’s way: Start with the smallest possible values of the parameters, e.g., BUFFER_SIZE = 2 and BUFFER_ELEMS = {0, 1}. The idea is to check the specification in the smallest possible scope first. If there are no bugs found, increase the parameters gradually until you reach the reasonable values.

1.2. Checking the specification Engineer’s way

To check the specification the Engineer’s way, I have created the TLA⁺ model MC10u8_BuggyCircularBuffer.tla with BUFFER_SIZE = 10 and BUFFER_ELEMS = 0..255. For technical reasons, we also need the TLC config file MC.cfg. Follow the links to see the details. Further, I’ve run TLC on this model to check the invariant SafeInv:

$ java -cp tla2tools.jar "-XX:+UseParallelGC" tlc2.TLC \
  -config MC.cfg MC10u8_BuggyCircularBuffer.tla

I wanted to see how far TLC could go, so I gave it a machine with 128 GB of RAM and 32 cores. TLC has explored around 3 billion states in about 40 minutes. After consuming 400 GB of disk space, it has run out of disk space and terminated. No bug was found. Is this surprising? Not really. In this configuration, TLC has to enumerate $(2^8)^{10} * 10 * 10 * 10 \approx 2^{90}$ states. (Thanks to Thomas Pani for correcting the initially wrong estimate.)

Obviously, anyone who used TLC for some time would have asked the same question as Markus did:

What about the small scope hypothesis? Can we use smaller parameters?

The answer to this question is basically the second approach, which I called the Verification Engineer’s way.

💡 Tip:

Apalache finds an invariant violation in 3 seconds, when running bounded model checking with the command check. However, I do not want to distract us from the main point of this blog post.

1.3. Checking the specification Verification Engineer’s way

This time, we use the instance MC2u1_BuggyCircularBuffer.tla that has BUFFER_SIZE = 2 and BUFFER_ELEMS = {0, 1}. Let’s run TLC on this instance to check the invariant SafeInv:

$ java -cp tla2tools.jar "-XX:+UseParallelGC" tlc2.TLC \
  -config MC.cfg MC2u1_BuggyCircularBuffer.tla
...
Error: Invariant SafeInv is violated.
...
10 states generated, 10 distinct states found, 5 states left on queue.

Yay! Just after visiting 10 states, TLC has found a violation of the invariant!

So if we pick the right small scope, exhaustive model checking with TLC finds the bug quite fast. In this example, it is hard to find a small scope that would not reveal the bug. Of course, when we know that the bug exists, it is easy to experiment with different values of the parameters and find the bug.

1.4. Checking the specification randomly

Surprisingly, if we forget about exhaustive enumeration, TLC finds an invariant violation for BUFFER_SIZE = 10 and BUFFER_ELEMS = 0..255 in less than a second. To do this, we run TLC with the option -simulate, which simply picks successor states at random:

$ java -cp tla2tools.jar "-XX:+UseParallelGC" tlc2.TLC \
  -simulate -config MC.cfg MC10u8_BuggyCircularBuffer.tla
...
Error: Invariant SafeInv is violated.
...

This effectiveness of randomized search is actually not a one-off thing. The Quint simulator finds the bug in less than a second. Similarly, the Rust property-based testing with proptest finds the bug almost immediately.

Interestingly, we did not have to tune the scope to be as tiny as possible, as we did for exhaustive model checking. Maybe this is why some engineers want to use property-based testing for every problem?

2. Thinking about the small scope hypothesis

In Example 1, we indeed found several assignments to BUFFER_SIZE and BUFFER_ELEMS that revealed an invariant violation. Actually, this bug is so simple that almost any assignment to the parameters would reveal it. We could even set BUFFER_SIZE = 1 and BUFFER_ELEMS = {0} to find the bug! If you want to push it further, think, whether BUFFER_ELEMS = {} would allow us to find an invariant violation.

In fact, if we go back to Alloy, the way Alloy restricts the scope is quite different from what we did in Example 1. Alloy limits the number of elements of each type in the specification. For example, if we had specified the circular buffer in Alloy, we could restrict the search scope as follows:

All integers, including BUFFER_SIZE and buffer indices, have the bit width of 4.
The number of unique buffer elements is $2^8$.

As a result, Alloy would consider all possible values of BUFFER_SIZE from 0 to 15, all possible values of buffer elements from 0 to 255, as well as all possible combinations of buffers of size up to 15. This is a much more flexible way to restrict the search space. In case of TLC, we did not have this flexibility: We had to give concrete values to BUFFER_SIZE and BUFFER_ELEMS.

💡 Tip:

Apalache has data generators, which are closer to the Alloy scopes in spirit, though they work slightly different from Alloy.

Hence, we have to distinguish between small scopes and small parameter assignments in TLC. After thinking about this question a bit more, I’ve asked myself:

Are there examples of specifications that have a small scope for a specific invariant violation, but it is hard to find concrete parameter assignments within this scope?

Even though my intuition says “yes”, there must be plenty of such examples, I could not come up with with non-artificial examples immediately. On top of my head, I can think of the following directions to look for such examples:

Examples from abstract interpretation. If we have non-trivial math with overflows and underflows, it might be hard to find concrete assignments that would trigger these overflows and underflows.
Examples from graph theory. For instance, non-planar graphs must contain subgraphs that are subdivisions of $K_5$ or $K_{3,3}$ (see Kuratowski’s theorem on planar graphs). So if a bug shows up only in non-planar graphs, there must be a small scope that reveals the bug. However, our concrete graph would have to contain a subdivision of $K_5$ or $K_{3,3}$, which is far from an arbitrary graph. Unfortunately, I do not know any concurrent or distributed algorithm that would have something to do with planar or non-planar graphs.

3. Your turn

It is your turn to decide how this blog post should continue. If someone gives me an interesting example or insight in a comment, I will update this blog post accordingly.

If you want to pick up my brain for a few hours, drop me an email. After that, we can switch to a messenger of your choice (e.g., Signal, Telegram, or others) or have a call. For short consultations, I accept payments via Stripe. I also consider longer-term engagements (from weeks to months), which would require a contract. You can see my portfolio at konnov.phd. I am based in Austria (CET/CEST) and have experience working with clients from the US, UK, and EU.