Performance hit when using `-lncheck final` #869

jonesmartins · 2024-02-02T16:36:19Z

Description

Given two models M1 and M2 of some specification. M1 and M2 are identical except:

M1 only contains invariants to check.
M2 contains the same invariants and liveness properties to check.

I checked M1 and M2 permuting the following configurations, making it 8 different tests:

setting BAQueue=true (or default);
using OffHeapDiskFPSet (or default);
-lncheck final (or default);

Expected Behavior

When -lncheck final is set and there are liveness properties to check, I'd expect safety checking throughput to be around the same for M1 and M2.

Actual Behavior

The throughput of safety checking of model M1 is about 15 million states/min.
The throughput of safety checking with -lncheck final of model M2 is about 5 million states/min.

Model	`BAQueue`	`OffHeapDiskFPSet`	`-lncheck final`	Total time	Avg. Throughput (st/min)
M1	❌	❌	❌	05 min 11 s	23,041,404.5
M1	❌	❌	✔️	05 min 09 s	23,173,030.6
M1	❌	✔️	❌	06 min 36 s	18,904,482.6
M1	❌	✔️	✔️	06 min 45 s	18,593,484.9
M1	✔️	❌	❌	05 min 11 s	23,014,255.5
M1	✔️	❌	✔️	05 min 08 s	23,432,080.4
M1	✔️	✔️	❌	08 min 58 s	14,509,361.8
M1	✔️	✔️	✔️	09 min 15 s	13,821,033.3

Model	`BAQueue`	`OffHeapDiskFPSet`	`-lncheck final`	Total liveness check time	Total time	Total - Liveness	Avg. Throughput (st/min)*
M2	❌	❌	❌	34 min 15 s	59 min 58 s	25 min 48 s	05,491,618.0
M3	❌	❌	✔️	19 min 48 s	45 min 13 s	25 min 15 s	05,549,502.4
M2	❌	✔️	❌	34 min 09 s	60 min 13 s	26 min 04 s	05,194,411.1
M2	❌	✔️	✔️	20 min 04 s	45 min 50 s	25 min 46 s	05,267,598.0
M2	✔️	❌	❌	36 min 33 s	58 min 24 s	21 min 51 s	05,472,436.0
M2	✔️	❌	✔️	20 min 10 s	44 min 44 s	24 min 34 s	05,512,822.1
M2	✔️	✔️	❌	33 min 19 s	59 min 18 s	25 min 59 s	05,238,499.1
M2	✔️	✔️	✔️	19 min 33 s	45 min 55 s	26 min 22 s	05,147,408.5

Total liveness check time is the sum of "Finished checking temporal properties in ..." for a given model check;
Total time is the time displayed by TLC in minutes and seconds: "Finished in ... at (...)";
Throughput refers to generated states (not distinct states) displayed every minute by TLC during state space exploration: "... states generated (... s/min)". This is their average;

Steps to Reproduce

Take some specification and create two models: M1 and M2 where M1 contains safety invariants and M2 contains the same safety invariants + liveness properties;
Make sure both models are large enough so model checking takes a few minutes to ensure enough throughput samples;
Confirm or deny that the throughput difference is significant;

Environment

TLC version: rev 3ea3222
java 21.0.2 2024-01-16 LTS
CPU: Intel Core i7-7700 @ 4.2 GHz x 4
RAM: 62.7 GiB
Disk: 200 GiB of free SSD space
Operating System: Linux Mint 22.1 Cinnamon

The text was updated successfully, but these errors were encountered:

lemmy · 2024-02-02T17:56:23Z

The -lncheck final option postpones the search of Strongly Connected Components in the behavior graph until after the safety check has been completed. Nonetheless, the construction of the behavior graph occurs during the safety check, which causes the observed slow down.

jonesmartins · 2024-02-02T18:18:12Z

(update: I added benchmarks to my post.)

Ah, I see. Interesting implementation detail. What's the difference between the behavior graph and state exploration graph?

Also, it seems that when BAQueue and OffHeapDiskFPSet are used simultaneously (without liveness checking), there's another performance hit?

lemmy · 2024-02-02T18:26:02Z

The behavior graph is the cross product of the state graph and the tableau that represents the liveness property.

Can you elaborate on the offheap and banquet finding?

jonesmartins · 2024-02-02T18:34:40Z

It says here that two Java system properties might improve many-core performance, so I added them to my benchmarks when I was trying to figure out what was slowing model checking down.

lemmy · 2024-02-02T18:44:24Z

That's interesting. Can you keep digging? By the way, we unfortunately have lost our performance test infra (#864).

jonesmartins · 2024-02-02T20:08:12Z

I added another update to the benchmark table with "Total liveness check time" and "Total time - Total liveness check time". It seems there's no significant difference in Total Liveness Check Time between any of the configurations.

As to why this is the case, I think we'd need to benchmark other models, no? I could certainly rerun these again just to make sure.

#864 is very unfortunate. :-(
Couldn't the performance infra be moved to AWS, Microsoft, or Oracle, since they're sponsoring the TLA+ Foundation?

lemmy · 2024-02-02T20:58:31Z

As to why this is the case, I think we'd need to benchmark other models, no? I could certainly rerun these again just to make sure.

Did you see the models in general/performance?

#864 is very unfortunate. :-( Couldn't the performance infra be moved to AWS, Microsoft, or Oracle, since they're sponsoring the TLA+ Foundation?

Probably, are you volunteering. :-)

jonesmartins · 2024-02-03T20:54:29Z

Did you see the models in general/performance?

Not in depth. Is the benchmark CSV stored somewhere?

lemmy · 2024-02-03T21:35:22Z

It's the build artifact of https://github.com/tlaplus/tlaplus/actions/runs/7565935840

snapshot: perf-results.zip

jonesmartins · 2024-02-03T22:22:40Z

Thanks!

Correct me if I'm wrong, but it seems all tests in general/performance/measure.sh use OffHeapDiskFPSet and BAQueue simultaneously, while all tests in general/performance/measureFPSet.sh compare OffHeapDiskFPSet and MSBDiskFPSet without BAQueue.

Since my benchmarks suggest there's a slowdown when OffHeadDiskFPSet and BAQueue are used simultaneously, and these cases weren't measured by our current benchmarks, we can't be sure for now...?

lemmy · 2024-02-04T02:31:48Z

You want measure.tla :-)

lemmy added the Tools The command line tools - TLC, SANY, ... label Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance hit when using `-lncheck final` #869

Performance hit when using `-lncheck final` #869

jonesmartins commented Feb 2, 2024 •

edited

lemmy commented Feb 2, 2024

jonesmartins commented Feb 2, 2024

lemmy commented Feb 2, 2024

jonesmartins commented Feb 2, 2024 •

edited

lemmy commented Feb 2, 2024

jonesmartins commented Feb 2, 2024

lemmy commented Feb 2, 2024

jonesmartins commented Feb 3, 2024

lemmy commented Feb 3, 2024 •

edited

jonesmartins commented Feb 3, 2024

lemmy commented Feb 4, 2024

Performance hit when using -lncheck final #869

Performance hit when using -lncheck final #869

Comments

jonesmartins commented Feb 2, 2024 • edited

Description

Expected Behavior

Actual Behavior

Steps to Reproduce

Environment

lemmy commented Feb 2, 2024

jonesmartins commented Feb 2, 2024

lemmy commented Feb 2, 2024

jonesmartins commented Feb 2, 2024 • edited

lemmy commented Feb 2, 2024

jonesmartins commented Feb 2, 2024

lemmy commented Feb 2, 2024

jonesmartins commented Feb 3, 2024

lemmy commented Feb 3, 2024 • edited

jonesmartins commented Feb 3, 2024

lemmy commented Feb 4, 2024

Performance hit when using `-lncheck final` #869

Performance hit when using `-lncheck final` #869

jonesmartins commented Feb 2, 2024 •

edited

jonesmartins commented Feb 2, 2024 •

edited

lemmy commented Feb 3, 2024 •

edited