Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make test crashes when "Exclusive process" gpu node are occupied #1

Open
Devilbilly opened this issue Aug 6, 2020 · 0 comments
Open

Comments

@Devilbilly
Copy link

In my environment that gpu might be occupied by others sometime( we share gpu resource and use job queue to utilize it).

All my gpu units are in Exclusive process mode, I found it very easy to have "Child aborted" during executing jobs.

Can you help to take a look and provide some solution for this? Thanks.

Here's the log of make test.

%:~/git/Heteroflow/build> make test
Running tests...
/usr/bin/ctest --force-new-ctest-process
Test project /nfs/peiyu/git/Heteroflow/build
Start 1: basics.static
1/17 Test #1: basics.static ....................***Exception: Child aborted 0.45 sec
Start 2: basics.host-tasks
2/17 Test #2: basics.host-tasks ................***Exception: Child aborted 0.21 sec
Start 3: basics.span
3/17 Test #3: basics.span ......................***Exception: Child aborted 0.24 sec
Start 4: basics.memset
4/17 Test #4: basics.memset ....................***Exception: Child aborted 0.21 sec
Start 5: basics.d2d
5/17 Test #5: basics.d2d .......................***Exception: Child aborted 0.20 sec
Start 6: basics.h2d
6/17 Test #6: basics.h2d .......................***Exception: Child aborted 0.46 sec
Start 7: basics.d2h
7/17 Test #7: basics.d2h .......................***Exception: Child aborted 0.56 sec
Start 8: basics.h2d2h
8/17 Test #8: basics.h2d2h .....................***Exception: Child aborted 0.80 sec
Start 9: basics.h2d2d2h
9/17 Test #9: basics.h2d2d2h ...................***Exception: Child aborted 1.21 sec
Start 10: basics.dependent-copies
10/17 Test #10: basics.dependent-copies ..........***Exception: Child aborted 0.68 sec
Start 11: basics.chained-kernels
11/17 Test #11: basics.chained-kernels ...........***Exception: Child aborted 0.55 sec
Start 12: basics.dependent-kernels
12/17 Test #12: basics.dependent-kernels .........***Exception: Child aborted 0.21 sec
Start 13: basics.statefulness
13/17 Test #13: basics.statefulness ..............***Exception: Child aborted 0.23 sec
Start 14: basics.run_n
14/17 Test #14: basics.run_n .....................***Exception: Child aborted 0.21 sec
Start 15: matrix.multiplication
15/17 Test #15: matrix.multiplication ............***Exception: Child aborted 1.32 sec
Start 16: matrix.transpose
16/17 Test #16: matrix.transpose .................***Exception: Child aborted 0.34 sec
Start 17: matrix.product
17/17 Test #17: matrix.product ...................***Exception: Child aborted 0.27 sec

0% tests passed, 17 tests failed out of 17

Total Test time (real) = 8.17 sec

The following tests FAILED:
1 - basics.static (Child aborted)
2 - basics.host-tasks (Child aborted)
3 - basics.span (Child aborted)
4 - basics.memset (Child aborted)
5 - basics.d2d (Child aborted)
6 - basics.h2d (Child aborted)
7 - basics.d2h (Child aborted)
8 - basics.h2d2h (Child aborted)
9 - basics.h2d2d2h (Child aborted)
10 - basics.dependent-copies (Child aborted)
11 - basics.chained-kernels (Child aborted)
12 - basics.dependent-kernels (Child aborted)
13 - basics.statefulness (Child aborted)
14 - basics.run_n (Child aborted)
15 - matrix.multiplication (Child aborted)
16 - matrix.transpose (Child aborted)
17 - matrix.product (Child aborted)
Errors while running CTest
Makefile:75: recipe for target 'test' failed
make: *** [test] Error 8
%:~/git/Heteroflow/build>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant