Kernel: Wake cores from idle directly rather than through a host thread #6837

riperiperi · 2024-05-19T20:40:57Z

Right now when a core enters an idle state, leaving that idle state requires us to first signal the core's idle thread, which then signals the correct thread that we want to run on the core. This means that in a lot of cases, we're paying double for a thread to be awoken from an idle state.

This PR moves this process to happen on the thread that is waking others out of idle, instead of an idle thread that needs to be awoken first.

For compatibility the process has been kept as similar as possible - the process for IdleThreadLoop has been migrated to TryLeaveIdle, and is gated by a condition variable that lets it run only once at a time for each core. A core is only considered for wake from idle if idle is both active and has been signalled - the signal is consumed and the active state is cleared when the core leaves idle. Maybe we could go further with this to avoid waiting on other thread signals to complete, but a port of the current behaviour is the safest improvement for now.

~~Dummy threads (just the idle thread at the moment) have been changed to have no host thread, as the work is now done by threads entering idle and signalling out of it.~~ The idle thread has been removed entirely, and idle core state is now directly on the scheduler.

This could put a bit of extra work on threads that would have triggered _idleInterruptEvent before, but I'd expect less time wasted than signalling all those reset events and the OS overhead that follows. Worst case is that other threads performing these signals at the same time will have to wait for each other, but it's still going to be a very short amount of time.

Improvements are very slight, but are best seen in games with heavy (or very misguided) multithreading, such as Pokemon: Legends Arceus. Improvements are expected in Scarlet/Violet and TOTK, but are harder to measure due to GPU trouble.

Testing on Linux/MacOS still to be done, definitely need to test more games as this affects all of them (obviously) and any issues might be rare to encounter.

Right now when a core enters an idle state, leaving that idle state requires us to first signal the core's idle thread, which then signals the correct thread that we want to run on the core. This means that in a lot of cases, we're paying double for a thread to be woken from an idle state. This PR moves this process to happen on the thread that is waking others out of idle, instead of an idle thread that needs to be woken first. For compatibility the process has been kept as similar as possible - the process for IdleThreadLoop has been migrated to TryLeaveIdle, and is gated by a condition variable that lets it run only once at a time for each core. A core is only considered for wake from idle if idle is both active and has been signalled - the signal is consumed and the active state is cleared when the core leaves idle. Dummy threads (just the idle thread at the moment) have been changed to have no host thread, as the work is now done by threads entering idle and signalling out of it. This could put a bit of extra work on threads that would have triggered `_idleInterruptEvent` before, but I'd expect less work than signalling all those reset events and the OS overhead that follows. Worst case is that other threads performing these signals at the same time will have to wait for each other, but it's still going to be a very short amount of time. Improvements are best seen in games with heavy (or very misguided) multithreading, such as Pokemon: Legends Arceus. Improvements are expected in Scarlet/Violet and TOTK, but are harder to measure. Testing on Linux/MacOS still to be done, definitely need to test more games as this affects all of them (obviously) and any issues might be rare to encounter.

github-actions · 2024-05-19T20:54:31Z

Download the artifacts for this pull request:

Old GUI (GTK3)

GUI-less (SDL2)

Only for Developers

riperiperi · 2024-05-19T20:59:35Z

Legends: Arceus provides the best view at what the difference is for core scheduling. There's a part of the core game loop where it wastefully swaps between three threads with what is basically a sequential workload for a few milliseconds. If we zoom in here with a profiler, we can see the behaviour before and after:

Before

You can see that guest threads 50, 54 and 53 are constantly blocking each other in a clear pattern. However, when each thread suspends, it also signals the idle threads for each core, OS thread 0,1,2,3. These threads then wake the next guest thread, so two OS context switches (shown by the arrows) need to be performed for the game to switch to the next thread.

After

The threads are in a similar pattern where they signal each other sequentially, but they are waking each other directly rather than waking idle threads first. You can see this via the arrows, where it's clearer what threads are unblocking each other. This won't be perfect - an unrelated thread could still wake a thread that was unblocked by some other thread that hasn't gotten to the idle awakening step, but it's nicer for debug and saves one OS context switch per idle.

It's worth noting that the profiler I'm using will exaggerate the runtime of threads, as it captures all context switches, but the time precision is a lot lower and it seems to round start down and end up. It also seems to slow down context switches a lot more, so with the new approach the game runs notably faster under a profiler.

On my Windows desktop with a Ryzen 3900X, there is a small boost to performance (peak performance shown, average performance difference is about the same #, same location):

Before

After

I still need to see if overall CPU usage drops, and how this might impact systems with less cores or power saving.

gdkchan · 2024-05-19T21:21:09Z

I wonder how hard it would be to remove _idleThread entirely. It seems one of the remaining uses is for currentThread.AddCpuTime(ticksDelta);, which is used to measure the amount of time a core is idle. It shouldn't be hard to special case this with a field on the KScheduler to accumulate idle time instead. As for the other uses, they are just for checking if the current/next thread is the "idle thread". null could also be used to indicate "idle thread".

riperiperi · 2024-05-20T12:48:34Z

Did some testing on steam deck, and its performance appears to be affected a lot more. The system has 4 cores instead of the 12 on my desktop, runs linux instead of windows, and has aggressive power saving measures. All tests are running on battery, and screenshots are a few minutes after the test begins so the power usage numbers settle.

Uncapped framerate

Average performance greatly improves. Fluctuations from 35-36 to 42-43. (around 4-5ms saved) Overall power usage seems similar, but more seems to go into the GPU to reach the new higher framerate (not shown on screenshot, but general pattern is there when watching it). Frame times are a lot more stable.

Before

After

Capped Framerate

When framerate is capped, power usage greatly decreases. Focus on the wattage numbers next to "battery" and "cpu", and the clock speeds it decided upon. Frametime is a lot more consistent. Fan speed/temps are much lower, it quickly becomes inaudible when the cap is turned on.

Before

After

I've always wondered why this game was underperforming on deck, I guess now we have the answer.

gdkchan

lgtm, thanks. I tested a few games here on Windows and macOS, and had no issues, I didn't play for long though, so might be worth to get more extended testing from someone else. Very nice to see the idle threads gone, it should make debugging a bit simpler. I didn't know it could have such a significant impact on Steam Deck too, so that was a nice surprise.

LukeWarnut · 2024-05-22T00:50:49Z

I tested Smash Ultimate for an extended amount of time and didn't find anything unordinary. I also briefly tried a few others and got the same result.

TSRBerry

Works great! Can't really comment on the code changes tbh. They make sense to me and I don't see any issues, but I'm very inexperienced in that area, so that's not really worth much.

github-actions bot added horizon Related to Ryujinx.HLE kernel Related to the kernel labels May 19, 2024

riperiperi added the performance Performance issue or improvement label May 19, 2024

riperiperi added 4 commits May 20, 2024 19:36

Remove _idleThread entirely

83f2eb9

Use spinwait so we don't completely blast the CPU with cmpxchg

90a2195

Didn't I already do this

5f509a8

Cleanup

48f86a9

riperiperi marked this pull request as ready for review May 20, 2024 19:20

ryujinx-mako bot requested review from AcK77, gdkchan, TSRBerry and a team May 20, 2024 19:20

gdkchan approved these changes May 20, 2024

View reviewed changes

ryujinx-mako bot requested a review from a team May 21, 2024 00:28

tommy-sorce approved these changes May 21, 2024

View reviewed changes

TSRBerry approved these changes May 22, 2024

View reviewed changes

gdkchan merged commit c1ed150 into Ryujinx:master May 22, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel: Wake cores from idle directly rather than through a host thread #6837

Kernel: Wake cores from idle directly rather than through a host thread #6837

riperiperi commented May 19, 2024 •

edited

github-actions bot commented May 19, 2024 •

edited

riperiperi commented May 19, 2024 •

edited

gdkchan commented May 19, 2024

riperiperi commented May 20, 2024 •

edited

gdkchan left a comment

LukeWarnut commented May 22, 2024

TSRBerry left a comment

Kernel: Wake cores from idle directly rather than through a host thread #6837

Kernel: Wake cores from idle directly rather than through a host thread #6837

Conversation

riperiperi commented May 19, 2024 • edited

github-actions bot commented May 19, 2024 • edited

riperiperi commented May 19, 2024 • edited

Before

After

Before

After

gdkchan commented May 19, 2024

riperiperi commented May 20, 2024 • edited

Uncapped framerate

Before

After

Capped Framerate

Before

After

gdkchan left a comment

Choose a reason for hiding this comment

LukeWarnut commented May 22, 2024

TSRBerry left a comment

Choose a reason for hiding this comment

riperiperi commented May 19, 2024 •

edited

github-actions bot commented May 19, 2024 •

edited

riperiperi commented May 19, 2024 •

edited

riperiperi commented May 20, 2024 •

edited