-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WaveShaperNode seems to cause memory corruption while upsampling and subsequently SEGFAULTs #202
Comments
Hi, this is very helpful. I haven't seen this under MSVC, but maybe I can repro by running a bunch of times. I think I can see where the problem lies from your traces, so thanks for that legwork. I'm traveling for a week, so can't dig in right now, but will take a look when I am back. |
The problem isn't always observable under MSVC, but it's still there. I found that crashes become almost guaranteed when adding some debug output to WaveShaperNode::processCurve, for example, but that's only because the buffers have already been corrupted at that point. On Linux, this is where I don't have a proper Visual Studio setup for debugging MSVC, but if you share your suspicions I could test on Linux/macOS. |
As a first pass, could you apply this patch? This provides better guards on accessing the array's data member, and also makes the WaveShaperNode temporary buffers member objects instead of unique_ptr objects. I haven't been able to detect a deletion of the float array from process(). Does this, as you mentioned, move the problem elsewhere?
|
And here's a further patch to mutex setting the curve.
|
... I've pushed the patches to |
Tested with 9975447:
Based on the above, I assume the corruption is still happening. Would need more debugging and testing on other machines to see how the problem manifests there. I also had audio drop immediately after
Maybe that was poorly explained on my part; inside More WSL logs below. Unfortunately, they may be red herrings if the corruption results from a threading-related issue:
|
FWIW, running the example with I don't have time to look into these (or do testing on other machines) right now, but it might be worth trying tsan/usan/asan. |
Thanks, much appreciated. |
I notice that UpSampler has this code
and the destination buffers are allocated only to that maybe needs to be |
Back to testing on native Linux now. The corruption is definitely still there. Presumably, it's the same problem. Symptoms:
These are separate runs. Crashes are still sporadic, and audio remains garbled. I'll try running the sanitizers (with GCC) next... |
Is the crash at the end of the demo? I just got repro due to deleting the ShaperNode while it was still processing in the audio thread. Disconnecting the node before deleting it should prevent that:
If this works for you, I would not consider it a fix, but a workaround. Deleting running nodes should Just Work. |
When you say "audio is garbled" do you mean, all the time for all demos? Or sporadically, with regards to the WaveShaperNode demo? |
Crashes seem to appear at "random" times, so not necessarily (or at least not exclusively) at the end of the demo.
Making this change might at best reduce the frequency of SEGFAULTs, but unfortunately it doesn't eliminate them. Afterwards, I still got a crash originating from what looks to be a synchronization issue. TSAN snapshot (last 2 warnings): Running LabSoundExample with ThreadSanitizer enabled (excerpt)
This does look like the
It's always garbled (on Linux only. Windows and macOS don't have this problem). Tested with I haven't looked into this more since it may resolve itself after the crashes are gone, but there were |
I pushed a fix for m_channelCount. The full TSAN report you posted shows the creation of two pulse audio threads, and that they are frequently racing. RtAudio creates two pulse Audio threads in the case that the input device and the output device have different id's. I see this at the beginning of the log
The differing number of channels suggests to me that we are getting two threads due to this detail that two devices are involved. To test a theory that there is a problem when launching to pulse audio threads, could you modify this line in ExamplesMain, which will prevent the second thread from being launched:
|
We are using RtAudio 5.1.0. I notice the following in the 5.2.0 release notes. Perhaps we are bumping against the issues they have fixed in 5.2.0, as my current theory is that we are getting two devices when we actually only want one. I don't notice any related changes in the 6.0 release. Release 5.2.0 |
Pushed an update to 5.2.0, but not 6, as 6 has many API changes. |
I'm now able to reproduce the data race locally! This should help things along :) |
The good news is that the RtAudio upgrade seems to have resolved the "garbled audio" (Linux-only) issue, as far as I can tell. Still on Linux, there are a few remaining problems that may be due to the corruption/data race:
Finally, I noticed that the debug output seems to suggest that the same thread is playing back the (I added Thread ID: 139853194852096 - File: LabSound/src/core/AudioNode.cpp:160
Scheduler: fade in
Thread ID: 139853194852096 - File: LabSound/src/core/AudioNode.cpp:160
Scheduler: fade in
Thread ID: 139853194852096 - File: LabSound/src/core/AudioNode.cpp:183
Scheduler: playing
Thread ID: 139853194852096 - File: LabSound/src/core/AudioNode.cpp:183
Scheduler: playing
B i -2 curve[0] -1.000000 curve[44099] 0.999955 oversample none
Thread ID: 139853194852096 - File: LabSound/src/core/AudioNode.cpp:160
Scheduler: fade in
Thread ID: 139853194852096 - File: LabSound/src/core/AudioNode.cpp:160
Scheduler: fade in
Thread ID: 139853194852096 - File: LabSound/src/core/AudioNode.cpp:183
Scheduler: playing
Thread ID: 139853194852096 - File: LabSound/src/core/AudioNode.cpp:183
Scheduler: playing
B i -1 curve[0] -1.000000 curve[44099] 0.999973 oversample none
B i 0 curve[0] -1.000000 curve[44099] 0.999985 oversample none
B i 1 curve[0] -1.000000 curve[44099] 0.999994 oversample none
... This strikes me as rather peculiar to say the least. It's also visible in the other logs. Might be intentional, though? |
i don't hear the buzzing. The -1 I introduced in the "odd samples" write could be the cause... I'm working on the race first, the new info is also interesting, thanks |
It's a relief and reassuring that it was possible to deduce a possible RtAudio initialization issue, and upgrading an RtAudio that included Pulse Audio initialization fixes resolved the garbled audio. |
There is an unfortunate race in the RtAudio code, even the new one. I've corrected it using a std::atomic flag so that the RtAudio callback doesn't attempt to do processing before the RtAudio set up is complete. The upshot of this is that the temporary RtAudio context created to enumerate devices can conflict with the RtAudio device created to play back audio. Although I corrected the race, I also changed the RtAudio object itself to be a global singleton, since clearly RtAudio was not written with the idea that there might be more than one RtAudio object instantiated. I corrected all the runtime races I encountered in device set up, and in the WaveShaperNode. There is still a race when the WaveShaperNode is deleted, my workaround doesn't work, so I will continue looking at that. This much, I hope, will fix all the non-shutdown related tsan issues you are encountering, and also I hope, it will resolve the "playing twice" issue you noticed. Playing twice would definitely cause garbling or crackling in many nodes that expect that they will not be invoked twice per render quantum. Fixes to this point (not including a Shaper node deletion fix) are now pushed. |
If the fix to RtAudio improves things for you, I will send a pull request upstream to the RtAudio project (or rather, I will check if it has been fixed in RtAudio 6, and take it from there). |
pushed fix for race on wave shaper node deletion. down to one race when the destination audio node is deleted on demo shutdown, and the rtaudio callback is still running. |
pushed fix to RtAudio shutdown races, and all the ones in LabSound I could find. Next step is to investigate if RtAudio 6 has corrected the races. |
RtAudio 6 has introduced mutexes on mac which is more heavy-handed than my fixes. RtAudio 6 hasn't got all the races fixed. Not going to upgrade to 6 unless someone points to an urgent need, since I'd rather not spend time fixing 6... |
Thanks for your effort! It seems there are significantly fewer data races, though some must still exist. Remaining issues:
Here's some more logs that might help (all while on 565cfde): Firstly, this is one run where the audio dropped for the entire testNew Thread 0x7ffff31d3700 (LWP 57295)]
[New Thread 0x7ffff29d2700 (LWP 57296)]
[New Thread 0x7ffff21d1700 (LWP 57297)]
08:39:10 TRACE LabSound/src/core/AudioContext.cpp:534: Begin UpdateGraphThread
Thread ID: 140737263773440 - File: LabSound/src/core/AudioNode.cpp:160
Scheduler: fade in
Thread ID: 140737263773440 - File: LabSound/src/core/AudioNode.cpp:160
Scheduler: fade in
Thread ID: 140737263773440 - File: LabSound/src/core/AudioNode.cpp:183
Scheduler: playing
Thread ID: 140737263773440 - File: LabSound/src/core/AudioNode.cpp:183
Scheduler: playing
B i -2 curve[0] -1.000000 curve[44099] 0.999955 oversample none
08:39:10 TRACE LabSound/src/core/AudioContext.cpp:583: End UpdateGraphThread
// Short "bleeping" sound here, then audio is dead (and remains dead for the entirety of the run)
[Thread 0x7ffff21d1700 (LWP 57297) exited]
B i -1 curve[0] -1.000000 curve[44099] 0.999973 oversample none
B i 0 curve[0] -1.000000 curve[44099] 0.999985 oversample none
B i 1 curve[0] -1.000000 curve[44099] 0.999994 oversample none
B i 2 curve[0] -1.000000 curve[44099] 1.000000 oversample none For comparison, this is a "normal" run where audio playback works as expectedNew Thread 0x7ffff31d3700 (LWP 57555)]
[New Thread 0x7ffff29d2700 (LWP 57556)]
[New Thread 0x7ffff21d1700 (LWP 57557)]
11:44:39 TRACE LabSound/src/core/AudioContext.cpp:534: Begin UpdateGraphThread
B i -2 curve[0] -1.000000 curve[44099] 0.999955 oversample none
11:44:39 TRACE LabSound/src/core/AudioContext.cpp:583: End UpdateGraphThread
[Thread 0x7ffff21d1700 (LWP 57557) exited]
Thread ID: 140737263773440 - File: LabSound/src/core/AudioNode.cpp:160
Scheduler: fade in
Thread ID: 140737263773440 - File: LabSound/src/core/AudioNode.cpp:160
Scheduler: fade in
Thread ID: 140737263773440 - File: LabSound/src/core/AudioNode.cpp:183
Scheduler: playing
Thread ID: 140737263773440 - File: LabSound/src/core/AudioNode.cpp:183
Scheduler: playing
B i -1 curve[0] -1.000000 curve[44099] 0.999973 oversample none
B i 0 curve[0] -1.000000 curve[44099] 0.999985 oversample none
B i 1 curve[0] -1.000000 curve[44099] 0.999994 oversample none Lastly, here are the full TSAN logs for dropout/no dropout/the virtual method crash: f34e1fede723dc9565cf382783544775
It's a bit difficult to determine whether things improved as the crashes have been sporadic:
I'd say it definitely feels like there's a lot less actual crashing now. However, clearly there are still some concurrency issues left. |
Just observed another crash, this time with MSVC: A i 0 curve[0] -0.342995 curve[44099] 0.342989 oversample none
A i 1 curve[0] -0.344421 curve[44099] 0.344417 oversample none
A i 2 curve[0] -0.345305 curve[44099] 0.345301 oversample none
15:33:06 TRACE LabSound__LabSound\src\core\AudioContext.cpp:170: Begin AudioContext::~AudioContext()
15:33:06 TRACE LabSound__LabSound\src\core\AudioContext.cpp:262: AudioContext::uninitialize()
15:33:06 INFO LabSound__LabSound\src\core\AudioContext.cpp:195: Finish AudioContext::~AudioContext()
D:\a\_work\1\s\src\vctools\crt\github\stl\src\mutex.cpp(61): mutex destroyed while busy This happened after the example completed, with the same duplicate playback/buzzing problem as on Linux. |
Thanks for the update. Since you mention having an issue with an MSVC build, that gives me an avenue to see if I can get the buzzing repro'd. I have a bunch of windows machines here.
How are you doing this? If you have a command line or script to share, I can start doing the same thing here. |
Just running the program in a loop. The current script is here - but beware, the windows versions are completely untested. The last few SEGFAULTs appeared after 52 and 66 iterations, so it can take quite a while. Still seems to be the same problem: 23:16:11 INFO LabSound/src/backends/RtAudio/AudioDevice_RtAudio.cpp:59: using rtaudio api linux_pulse
23:16:11 INFO LabSound/include/LabSound/core/AudioDevice.h:104: AudioHardwareDeviceNode()
* Sample Rate: 48000.000000
* Input Channels: 0
* Output Channels: 2
23:16:11 INFO LabSound/src/backends/RtAudio/AudioDevice_RtAudio.cpp:128: using output device idx: 0
23:16:11 INFO LabSound/src/backends/RtAudio/AudioDevice_RtAudio.cpp:130: using output device name: Family 17h (Models 10h-1fh) HD Audio Controller Analog Stereo
23:16:11 INFO LabSound/src/backends/RtAudio/AudioDevice_RtAudio.cpp:135: using input device idx: -1
23:16:11 TRACE LabSound/src/core/AudioContext.cpp:534: Begin UpdateGraphThread
23:16:11 TRACE LabSound/src/core/AudioContext.cpp:583: End UpdateGraphThread
23:16:27 TRACE LabSound/src/core/AudioContext.cpp:170: corrupted double-linked list
Begin AudioContext::~AudioContext()
// Crashed here, although there may be buffered output that wasn't flushed |
Cool. ExamplesMain also has an iteration count for this purpose ( LabSound/examples/src/ExamplesMain.cpp Line 65 in 0bd8dfb
|
Huh... I actually didn't see that, but it wouldn't catch crashes in the setup/teardown part of the example program. Speaking of which, the remaining SEGFAULTs might be due to the "deleted node is still playing" problem you mentioned earlier. TSAN log: Use-after-free when deleting the WaveShaperNode while it's still playing
As far as I can tell, this would explain the crashes at the end (but not the dropped audio, which warrants further investigation).
What should a proper fix look like? I mean one that works for every case, not the "hack". Can they just be stopped/disconnected? |
Stopped/disconnected should fix it yes. Looking at it more deeply: Every AudioNode has the structural pattern:
The public interfaces have to live as long as the C++ object the developer is holding onto in their code Trying to synchronize those lifespans is how I've been trying to solve concurrency. The fundamental problem is that the public interfaces are conflated with the data for processing. In other words, one can set a value, but processing has to reach into the Param or Setting to get the value. I think what has to happen, structurally, across the board, is that the public interfaces need to queue operations to the processor, and the data for processing should be separably held, presumably in an object held by a shared pointer, so that the data itself is deleted when processing is over, and there is not synchronization required with the holding Node object. |
That sounds like a major rework, so unfortunately a bit above what I would be able to contribute with my limited knowledge. If you want to sketch out a prototype/demo I can probably get a better understanding of the challenges involved, but no problem if there isn't time. I'm investigating the "dropped audio" bug, which I suspect could be related to this issue, based on my observations:
I still want to do more testing, especially on macOS, to see if it's happening there. But maybe I'm on the wrong path completely. |
tsan is looking clean on mac, at least :) Interesting note on pulse from the SDL report. |
Hello! I've been trying to make this library work for quite some time, but there's a tiny issue preventing me from using it.
Symptoms
In general, the errors I've been facing appear somewhat erratic and usually end up in a crash sooner or later (...usually sooner).
LabSoundExample
crashes with a "double free detected" errornan
values to the buffer0xbadf00d
, etc.)All of this is manifesting in the
WaverShaperNode
and related classes (UpSampler
,DownSampler
,DirectConvolver
).Affected Platforms
I've observed variations of this problem on all of the platforms that I have access to:
It's curious that audio playback isn't garbled on Windows and Mac OS. I think that's because
nan
is written only on Linux.Root Cause Analysis
It looks as though the
AudioArray
buffers are freed multiple times. However, commenting out all of thefree
calls simply moves the problem around; the faults are still happening andgdb
blames theunique_ptr
trying to delete this =0x00
(i.e.,nullptr
).I noticed some
TODO: mutex
comments. However, adding mutexes to those critical sections didn't help. I've tried all sorts of other random things, like changing audio backends or replacing theAudioArray
withstd::vector
, and oversampling arrays withunique_ptr
. One area I haven't investigated much yet is the potential for multithreading issues (as they're generally painful).Logs
"Double free detected" (Kali Linux inside WSL VM)
Uninitialized heap memory (MSYS/Windows)
Potential stack corruption (also MSYS)
There's many other log dumps that I could provide, but they all show pretty much the same thing in slightly different colors.
Next Steps
I've tried poking at these crashes from many angles, but I don't know the codebase (nor audio programming) well. Any ideas?
The text was updated successfully, but these errors were encountered: