Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi_ray_trace is not working under MPI (mpi4py | OpenMPI 3.1.4, CentOS7.6) #5946

Open
natsuwater opened this issue Apr 19, 2024 · 4 comments
Labels
bug Uh-oh! Something isn't working as expected.

Comments

@natsuwater
Copy link
Contributor

Describe the bug, what's wrong, and what you expected.

On a rather old HPC system (still CentOS7.6), when pyvista's multi_ray_trace is used with mpi4py, following message appears:

$ mpirun -np 1 python mpi4py_ml_trace.py

--------------------------------------------------------------------------
A process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:          [[63416,1],0] (PID 11717)

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
[0 1 2]

Note: I try to reproduce this on Ubuntu 22.04 | wsl2
However the same code just worked fine on Ubuntu 22.04.

So, It might be a bug specific to an old linux system, which actually has passed its EOL. ;-<

Ubuntu environment which worked fine was as follows:

--------------------------------------------------------------------------------
  Date: Fri Apr 19 19:11:20 2024 JST

                  OS : Linux
              CPU(s) : 16
             Machine : x86_64
        Architecture : 64bit
                 RAM : 15.0 GiB
         Environment : Python
         File system : ext4
          GPU Vendor : Microsoft Corporation
        GPU Renderer : D3D12 (AMD Radeon(TM) Graphics)
         GPU Version : 4.2 (Core Profile) Mesa 23.2.1-1ubuntu3.1~22.04.2
    MathText Support : True

  Python 3.12.3 (main, Apr 19 2024, 16:55:01) [GCC 11.4.0]

             pyvista : 0.43.5
                 vtk : 9.3.0
               numpy : 1.26.4
          matplotlib : 3.8.4
              scooby : 0.9.2
               pooch : 1.8.1
              pillow : 10.3.0
             imageio : 2.34.0
             IPython : 8.23.0
            colorcet : 3.1.0
             cmocean : 4.0.3
          ipywidgets : 8.1.2
              meshio : 5.3.5
          jupyterlab : 4.1.6
               trame : 3.6.0
        trame_client : 3.0.2
        trame_server : 3.0.0
           trame_vtk : 2.8.5
       trame_vuetify : 2.4.3
jupyter_server_proxy : 4.1.2
        nest_asyncio : 1.6.0
-------------------------------------------------------------------------------- 

Steps to reproduce the bug.

Run the following code on CentOS7.6 machine.
mpi4py was built with OpenMPI 3.1.4 with GCC 4.8.5

from mpi4py import MPI
from trimesh import Trimesh

import pyvista as pv

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
nprocs = comm.Get_size()

def exec():

    sphere = pv.Sphere()
    points, rays, cells = sphere.multi_ray_trace(
        [[0, 0, 0]] * 3,
        [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
        first_point=True,
        )

    print(rays)


if __name__ == "__main__":
    exec()

System Information

--------------------------------------------------------------------------------
  Date: Fri Apr 19 18:54:38 2024 JST

                OS : Linux
            CPU(s) : 32
           Machine : x86_64
      Architecture : 64bit
               RAM : 377.4 GiB
       Environment : Python
       File system : xfs
       GPU Details : error
  MathText Support : False

  Python 3.9.16 (main, May 25 2023, 12:52:30)  [GCC 4.8.5 20150623 (Red Hat
  4.8.5-44)]

           pyvista : 0.42.3
               vtk : 9.2.6
             numpy : 1.26.1
        matplotlib : 3.8.1
            scooby : 0.9.2
             pooch : 1.8.0
            pillow : 10.1.0
           IPython : 8.17.2
             scipy : 1.11.3
        jupyterlab : 4.0.8
             trame : 3.5.2
      trame_client : 2.16.2
      trame_server : 2.17.2
      nest_asyncio : 1.5.8
--------------------------------------------------------------------------------

Screenshots

No response

@natsuwater natsuwater added the bug Uh-oh! Something isn't working as expected. label Apr 19, 2024
@Keou0007
Copy link
Contributor

I don't have a good understanding of how embree works under the hood, but I don't believe it uses MPI.
Some questions that come to mind:

  • Is there a good reason why you need to be running this ray trace code using MPI? I assume the example is concocted, since it really doesn't have any reason to be run with MPI.
  • The code appears to work, and the warning is indeed just a warning. Can you just turn that warning off as it suggests?

@natsuwater
Copy link
Contributor Author

A little more explanation:

  • Ray tracing itself does not need to be running under MPI, as you correctly pointed out.
    However, other parts of the simulation code I’m building are time-consuming, and I’m trying to apply MPI to those parts.

  • The calculation results after this warning were quite different from the non-MPI single process calculation.

@Keou0007
Copy link
Contributor

Keou0007 commented Apr 29, 2024

Ok, so if the calculation is wrong then I can only assume the warning is for a good reason and you're getting memory corruption/errors/something. I've seen warnings like this before when using multiprocessing code across both Linux and macOS (which spawn subprocesses in different ways), and the ultimate solution was for me to write my code differently. Unfortunately sometimes different libraries don't play well together and there's not much you can do about that.

I don't really know anything about mpi4py, but here's some fumbly suggestions for you if you're feeling out of options:

  • Play around with some test code and try and isolate exactly which part of your code is throwing the error.
  • If the error is caused by embree (and if you don't actually have to do a lot of ray traces, so performance isn't an issue) then you could try doing the same thing with the VTK single ray trace function (available via pyvista), or even the rtree based ray tracing in trimesh.
  • See if you can avoid the fork(), which may not be possible if you're stuck running via mpirun.
  • Try and seperate what you're doing so the ray tracing can be run seperately from the MPI component.
  • You say you're code is time consuming and so you're trying to apply MPI. Without knowing anything about what you're doing, I can only suggest to avoid MPI if you're not decomposing your problem in a way that specifically requires interprocess communication. Is there a different way you could speed up your code?

From what you've said, it seems to me like the error exists either in your code or in one of the dependencies, and there's probably nothing that can be changed in pyvista to fix it. Since you only seem to be able to reproduce it on an EOL OS on an old HPC, then I'm not convinced it would be worth the effort to try and fix anyway. Either way my opinion is this issue should probably be closed.

I'm happy for you to email me if you'd like to discuss your code a bit more specifically, can't promise I will be able to help though.

@natsuwater
Copy link
Contributor Author

I agree that it would be not worth the effort to fix and this issue should be closed.

I really thank you for your time and consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Uh-oh! Something isn't working as expected.
Projects
None yet
Development

No branches or pull requests

2 participants