Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert failure: executionAborted in GcInfoDecoder::EnumerateLiveSlots #102370

Open
jakobbotsch opened this issue May 17, 2024 · 8 comments
Open
Labels
area-VM-coreclr blocking-clean-ci-optional Blocking optional rolling runs Known Build Error Use this to report build issues in the .NET Helix tab untriaged New issue has not been triaged by the area owner

Comments

@jakobbotsch
Copy link
Member

jakobbotsch commented May 17, 2024

Build Information

Build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=678333&view=results
Build error leg or test failing:
Example console log: https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-102261-merge-e822cbb23a0f465186/LibraryImportGenerator.Unit.Tests/1/console.c3aa79dd.log?helixlogtype=result

/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: LibraryImportGenerator.Unit.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  LibraryImportGenerator.Unit.Tests (found 185 of 190 test cases)
  Starting:    LibraryImportGenerator.Unit.Tests (parallel test collections = on [2 threads], stop on fail = off)
    LibraryImportGenerator.UnitTests.Compiles.ValidateSnippetsWithMarshalType [SKIP]
      No current scenarios to test.

Assert failure(PID 28 [0x0000001c], Thread: 73 [0x0049]): executionAborted
    File: /__w/1/s/src/coreclr/vm/gcinfodecoder.cpp:801
    Image: /root/helix/work/correlation/dotnet

Maybe related to #101890?

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "executionAborted",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=678333
Error message validated: [executionAborted]
Result validation: ❌ Known issue did not match with the provided build.
Validation performed at: 5/17/2024 8:39:56 AM UTC

Report

Build Definition Test Pull Request
698579 dotnet/runtime LibraryImportGenerator.Unit.Tests.WorkItemExecution #101580
2455288 dotnet-runtime LibraryImportGenerator.Unit.Tests.WorkItemExecution

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 2
@jakobbotsch jakobbotsch added blocking-clean-ci-optional Blocking optional rolling runs Known Build Error Use this to report build issues in the .NET Helix tab labels May 17, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 17, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 17, 2024
@jakobbotsch
Copy link
Member Author

Also cc @VSadov, I'm not familiar with the logic here, but wonder if it could be related to the new GC safe points.

@jkotas jkotas added area-VM-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 17, 2024
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@VSadov
Copy link
Member

VSadov commented May 17, 2024

Also cc @VSadov, I'm not familiar with the logic here, but wonder if it could be related to the new GC safe points.

With interruptible GC safe points we can stress test each and every safe point.
In the past we would skip a good portion of safe points - JIT helpers, direct calls, for example would not be stress tested.

So - the change to safe points could be involved here, but it could also expose some existing bug.
(we did not see that when the change was merged, but there is a possibility of some nondeterministic bug)

Does this happen without the PR change?

@jakobbotsch
Copy link
Member Author

Does this happen without the PR change?

I don't know, I haven't investigated.
Happened again in the recent libraries-jitstress run:
Pipeline run: https://dev.azure.com/dnceng-public/public/_build/results?buildId=690710&view=results
Job name: net9.0-linux-Release-arm-disabler2r
Console log: https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-705beed46fcf4015ad/LibraryImportGenerator.Unit.Tests/1/console.f4051e9b.log?helixlogtype=result

@VSadov
Copy link
Member

VSadov commented Jun 5, 2024

Always on arm32 and always in LibraryImportGenerator.Unit.Tests

The failure is strange though. It means that we try to initiate a stack walk in a fully-interruptible method, but the IP happens to not be in one of the interruptible ranges. It is hard to think of how this could happen as anything that leads to stack walks should at some point ask "is this IP interruptible?".

@VSadov
Copy link
Member

VSadov commented Jun 5, 2024

The last failure is interesting as it is not in a JIT stress run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-VM-coreclr blocking-clean-ci-optional Blocking optional rolling runs Known Build Error Use this to report build issues in the .NET Helix tab untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

4 participants