Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance hit when using -lncheck final #869

Open
jonesmartins opened this issue Feb 2, 2024 · 11 comments
Open

Performance hit when using -lncheck final #869

jonesmartins opened this issue Feb 2, 2024 · 11 comments
Labels
Tools The command line tools - TLC, SANY, ...

Comments

@jonesmartins
Copy link
Contributor

jonesmartins commented Feb 2, 2024

Description

Given two models M1 and M2 of some specification. M1 and M2 are identical except:

  • M1 only contains invariants to check.
  • M2 contains the same invariants and liveness properties to check.

I checked M1 and M2 permuting the following configurations, making it 8 different tests:

  • setting BAQueue=true (or default);
  • using OffHeapDiskFPSet (or default);
  • -lncheck final (or default);

Expected Behavior

When -lncheck final is set and there are liveness properties to check, I'd expect safety checking throughput to be around the same for M1 and M2.

Actual Behavior

The throughput of safety checking of model M1 is about 15 million states/min.
The throughput of safety checking with -lncheck final of model M2 is about 5 million states/min.

Model BAQueue OffHeapDiskFPSet -lncheck final Total time Avg. Throughput (st/min)
M1 05 min 11 s 23,041,404.5
M1 ✔️ 05 min 09 s 23,173,030.6
M1 ✔️ 06 min 36 s 18,904,482.6
M1 ✔️ ✔️ 06 min 45 s 18,593,484.9
M1 ✔️ 05 min 11 s 23,014,255.5
M1 ✔️ ✔️ 05 min 08 s 23,432,080.4
M1 ✔️ ✔️ 08 min 58 s 14,509,361.8
M1 ✔️ ✔️ ✔️ 09 min 15 s 13,821,033.3
Model BAQueue OffHeapDiskFPSet -lncheck final Total liveness check time Total time Total - Liveness Avg. Throughput (st/min)*
M2 34 min 15 s 59 min 58 s 25 min 48 s 05,491,618.0
M3 ✔️ 19 min 48 s 45 min 13 s 25 min 15 s 05,549,502.4
M2 ✔️ 34 min 09 s 60 min 13 s 26 min 04 s 05,194,411.1
M2 ✔️ ✔️ 20 min 04 s 45 min 50 s 25 min 46 s 05,267,598.0
M2 ✔️ 36 min 33 s 58 min 24 s 21 min 51 s 05,472,436.0
M2 ✔️ ✔️ 20 min 10 s 44 min 44 s 24 min 34 s 05,512,822.1
M2 ✔️ ✔️ 33 min 19 s 59 min 18 s 25 min 59 s 05,238,499.1
M2 ✔️ ✔️ ✔️ 19 min 33 s 45 min 55 s 26 min 22 s 05,147,408.5

Total liveness check time is the sum of "Finished checking temporal properties in ..." for a given model check;
Total time is the time displayed by TLC in minutes and seconds: "Finished in ... at (...)";
Throughput refers to generated states (not distinct states) displayed every minute by TLC during state space exploration: "... states generated (... s/min)". This is their average;

Steps to Reproduce

  1. Take some specification and create two models: M1 and M2 where M1 contains safety invariants and M2 contains the same safety invariants + liveness properties;
  2. Make sure both models are large enough so model checking takes a few minutes to ensure enough throughput samples;
  3. Confirm or deny that the throughput difference is significant;

Environment

  • TLC version: rev 3ea3222
  • java 21.0.2 2024-01-16 LTS
  • CPU: Intel Core i7-7700 @ 4.2 GHz x 4
  • RAM: 62.7 GiB
  • Disk: 200 GiB of free SSD space
  • Operating System: Linux Mint 22.1 Cinnamon
@lemmy
Copy link
Member

lemmy commented Feb 2, 2024

The -lncheck final option postpones the search of Strongly Connected Components in the behavior graph until after the safety check has been completed. Nonetheless, the construction of the behavior graph occurs during the safety check, which causes the observed slow down.

@lemmy lemmy added the Tools The command line tools - TLC, SANY, ... label Feb 2, 2024
@jonesmartins
Copy link
Contributor Author

(update: I added benchmarks to my post.)

Ah, I see. Interesting implementation detail. What's the difference between the behavior graph and state exploration graph?

Also, it seems that when BAQueue and OffHeapDiskFPSet are used simultaneously (without liveness checking), there's another performance hit?

@lemmy
Copy link
Member

lemmy commented Feb 2, 2024

The behavior graph is the cross product of the state graph and the tableau that represents the liveness property.

Can you elaborate on the offheap and banquet finding?

@jonesmartins
Copy link
Contributor Author

jonesmartins commented Feb 2, 2024

It says here that two Java system properties might improve many-core performance, so I added them to my benchmarks when I was trying to figure out what was slowing model checking down.

@lemmy
Copy link
Member

lemmy commented Feb 2, 2024

That's interesting. Can you keep digging? By the way, we unfortunately have lost our performance test infra (#864).

@jonesmartins
Copy link
Contributor Author

I added another update to the benchmark table with "Total liveness check time" and "Total time - Total liveness check time". It seems there's no significant difference in Total Liveness Check Time between any of the configurations.

As to why this is the case, I think we'd need to benchmark other models, no? I could certainly rerun these again just to make sure.

#864 is very unfortunate. :-(
Couldn't the performance infra be moved to AWS, Microsoft, or Oracle, since they're sponsoring the TLA+ Foundation?

@lemmy
Copy link
Member

lemmy commented Feb 2, 2024

As to why this is the case, I think we'd need to benchmark other models, no? I could certainly rerun these again just to make sure.

Did you see the models in general/performance?

#864 is very unfortunate. :-( Couldn't the performance infra be moved to AWS, Microsoft, or Oracle, since they're sponsoring the TLA+ Foundation?

Probably, are you volunteering. :-)

@jonesmartins
Copy link
Contributor Author

Did you see the models in general/performance?

Not in depth. Is the benchmark CSV stored somewhere?

@lemmy
Copy link
Member

lemmy commented Feb 3, 2024

It's the build artifact of https://github.com/tlaplus/tlaplus/actions/runs/7565935840

snapshot: perf-results.zip

@jonesmartins
Copy link
Contributor Author

Thanks!

Correct me if I'm wrong, but it seems all tests in general/performance/measure.sh use OffHeapDiskFPSet and BAQueue simultaneously, while all tests in general/performance/measureFPSet.sh compare OffHeapDiskFPSet and MSBDiskFPSet without BAQueue.

Since my benchmarks suggest there's a slowdown when OffHeadDiskFPSet and BAQueue are used simultaneously, and these cases weren't measured by our current benchmarks, we can't be sure for now...?

@lemmy
Copy link
Member

lemmy commented Feb 4, 2024

You want measure.tla :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tools The command line tools - TLC, SANY, ...
Development

No branches or pull requests

2 participants