Thinking about a random number generator: Part 3

This is Part 3 of a three-part blog post. If you have not read Part 1 and Part 2, please do that first.

3. Testing and Checking the Properties

Now it is time to check our properties! We will use two Quint commands:

quint run that runs the randomized simulator,
quint verify that runs the model checker Apalache.

If you wish to reproduce the experiments, use the specification prng2.qnt.

3.1. Testing the Properties with the Simulator

To test both properties, we simply use quint run:

$ quint run --invariant=safety ./specs/prng2.qnt
An example execution:

[State 0] { base: 1, bound: 3724719, input: 3724719, output: 0, pc: InLoop }
[State 1] { base: 256, bound: 3724719, input: 14549, output: 110, pc: InLoop }
[State 2] { base: 65536, bound: 3724719, input: 56, output: 28236, pc: InLoop }
[State 3] { base: 65536, bound: 3724719, input: 56, output: 618060, pc: Done }

[ok] No violation found (1248ms).

$ quint run --invariant=missingOutput ./specs/prng2.qnt
An example execution:

[State 0] { base: 1, bound: 7132656, input: 7132656, output: 0, pc: InLoop }
[State 1] { base: 256, bound: 7132656, input: 27861, output: 46, pc: InLoop }
[State 2] { base: 65536, bound: 7132656, input: 108, output: 11795, pc: InLoop }
[State 3] { base: 65536, bound: 7132656, input: 108, output: 2895379, pc: Done }

[ok] No violation found (1288ms).

In both cases, the simulator has exercised 10,000 random executions. This is the default setting that we can change by setting --max=samples e.g. to 1,000,000.

Does it mean that both properties hold true? Not exactly. Similar to property-based testing, the simulator reports a property violation, if it finds one. However, if the simulator does not find a bug, all we know is that the executions tried by the simulator, e.g., 1,000,000 executions, do not violate the invariants.

Of course, we can always increase the number of executions. We have to be aware though that the simulation times are increasing linearly with the number of executions. See the plot for the both invariants below.

How many executions shall we enumerate, to be completely sure that the invariant missingOutput holds true? Since we have fixed $bound$ to 0xff0001, we know the following:

Action init produces one state under the assumption that $bound$ equals to 0xff0001.
Action loop is executed exactly two times. Each time, this action non-deterministically chooses a value in the interval $[0, \texttt{WORD_BOUND})$.
Action postLoop is executed once. It non-deterministically chooses a value in the interval $[0, \texttt{WORD_BOUND})$.

Hence, for $bound = \texttt{0xff0001}$, we have $\texttt{WORD_BOUND}^3$ executions. Under the assumption of $\texttt{WORD_BOUND}=2^8$, we have $2^{24}$ executions! This is about 17 million executions. According to our plot, enumerating all these executions would take about 1 hour, even though we would have to distinguish visited executions from unvisited executions somehow. In case of $\texttt{WORD_BOUND}=2^{32}$, we would have $2^{96}$ executions. That’s really a lot!

3.2. Checking the Properties with Apalache

It would be great, if we could check the invariants against all executions at once. This is exactly what a model checker does¹.

Checking Property (1). Let’s run the model checker:

$ quint verify --invariant=safety specs/prng2.qnt

An example execution:

[State 0] { base: 1, bound: 16777216, input: 16777216, output: 0, pc: InLoop }
[State 1] { base: 256, bound: 16777216, input: 65536, output: 255, pc: InLoop }
[State 2]
{ base: 65536, bound: 16777216, input: 256, output: 65535, pc: InLoop }
[State 3]
{ base: 16777216, bound: 16777216, input: 1, output: 16777215, pc: InLoop }
[State 4]
{ base: 16777216, bound: 16777216, input: 1, output: 16777215, pc: Done }

[violation] Found an issue (430ms).
error: reached a deadlock

The model checker comes back almost immediately by saying that it had found a deadlock. What does it mean? The model checker has found an execution that cannot be continued. If you look at the diagram and the example execution, it actually makes sense: We have reached a state where pc == Done. By default, the model checker does not assume that it has to deal with a terminating state machine. If we were specifying a concurrent system, then the inability to continue an execution could indicate a serious design flaw.

In our case, we expect executions to terminate. Hence, we should instruct the model checker to ignore deadlocks. The easiest way to do that is by passing a configuration file to Apalache. Below, I am using no-deadlocks.json:

$ quint verify --apalache-config=specs/no-deadlocks.json --invariant=safety specs/prng2.qnt
[ok] No violation found (170ms).

This time, the model checker tells us that it could not find a violation. Is it for real? By default Apalache checks all executions of length up to 10 steps. Since the action loop is executed up to two times (remember, that we have set MAX_NWORDS to 3), all executions contain up to four actions under our current parameters. In other words, all executions belong to one of the four shapes:

init.then(postLoop)
init.then(loop).then(postLoop)
init.then(loop).then(loop).then(postLoop)

The above expressions are real Quint expressions, which define Runs. You can try them in REPL.

Knowing that our runs are composed of up to four actions and Apalache checks all executions up to 10 steps, we obtain a pretty solid proof that the invariant safety holds true . We could even use Apalache to show that all executions are indeed having at most 4 steps under MAX_NWORDS=3, but this post is already too long.

Notice that Apalache has checked the invariant in 170ms. That was blazing fast in comparison to the simulator. On top of that, we have got much better guarantees! I was running quint verify with Apalache being bootstrapped already. If you simply run quint verify, it will bootstrap Apalache, which takes 3-4 seconds.

After running Apalache, we know that my initial hypothesis about safety being broken does not hold true. Does it make the generator correct? Not yet. We still have Property (2) to check.

Checking Property (2). Recall that in Section 2.4 (Part 2) we have found out that if the invariant missingOutput holds true, then Property (2) is violated. Let’s check that with Apalache:

$ quint verify --apalache-config=specs/no-deadlocks.json \
  --invariant=missingOutput specs/prng2.qnt
[ok] No violation found (163ms).

The invariant holds true . If we look at the invariant missingOutput again, what we have proved with Apalache is that whenever $bound$ is set to 0xff0001, the variable $output$ is never assigned the value 0xff0000. This definitely goes against our expectation in Property (2) .

Again, it took Apalache 163ms to show that the invariant holds true. With the simulator, we would not be able to show that at all!

4. Root Cause Analysis

Why is it impossible to generate $output = \texttt{0xff0000}$ when $bound$ is set to 0xff0001? One way is to stare at the specification and try to find the reason. Another way is to interactively play with the specification. This is what we can do with REPL. We manually initialize the state machine with $bound$ set to 0xff0001 and replay several steps in the hope of finding the root cause:

$ quint -r specs/prng2.qnt::prng2
>>> all { bound' = 0xff0001, input' = 0xff0001, output' = 0, base' = 1, pc' = InLoop }
true
>>> step
true
>>> step
true
>>> step
true
>>> pc
Done
>>> input
255

Here we are! We cannot generate 255 (or 0xff in hex) to be the most significant word, as input == 255 and we have the following line in the specification:

  output' = (generated % input) * base + output,

If we look at the TypeScript code, this is no mistake made in translation:

  output = (squares64(state) % input) * base + output

That we take modulo input is the root cause of the problem.

We could try to quickly patch this line as follows:

  output' = (generated % (input + 1)) * base + output,

However, this would break the invariant safety, which would be easy to find with the model checker. This code needs a proper fix, which I will probably discuss in another blog post .

5. Impact Analysis

Now that we know that Property (1) is satisfied, whereas Property (2) does not hold true, we have to understand the impact of Property (2) being violated. How serious is it? We have one data point, for which the PRNG never produces one output value below the bound. Once we have found the root cause of the problem, we could find more data points that confirm it.

Since the PRNG is used by the simulator, it means that the simulator never explores certain states of a state machine. As a result, for certain specifications it would never be able to reach certain states that violate a state invariant. In this context, “never” means “never”. There are no probabilities involved. This is usually called incompleteness.

As we have discussed, in contrast to the model checker, the simulator would not be able to explore all executions in practical cases. Hence, it is not a serious failure in the logic of this feature.

6. Conclusions

My first blog post happened to be so long that I broke it down into three parts. In the future, I will write shorter ones. My goal was to demonstrate you how to write specifications and use tools to quickly reason about them without going back to pen and paper, or a whiteboard. In practical terms, our specification is quite small. However, it exhibits my most favorite feature of $\tla{}$ and Quint, namely, non-determinism.

I hope you found this text entertaining. If you do, please upvote the Github discussion, leave a comment there, star the repo, etc. It is important to me to see, whether anyone is curious about this topic and I should keep writing about it. Want to chat? You can find my contact details on My webpage.

I have mentioned $\tla{}$ in this post multiple times. Showing the $\tla{}$ specification of the PRNG definitely deserves another blog post .

Do you want to receive notifications when I write something new? Subscribe to the newsletter. New blog posts are going to be announced once per week (maybe twice, if I really have something!).

Footnotes

This statement comes with a caveat: By default Apalache checks all executions of length up to 10 steps. ↩