New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow reading of files from remote file stores such as S3 [willing to contribute a PR!] #6049
Comments
Hey @tkoyama010, I saw your 👍 on the issue and just wanted to check; can I take that as an endorsement of the idea/you being open to merging a PR that implements this? Sorry for the direct tag; I just want to be sure before I spend any time working on making this happen. Thanks! |
This seems like a good idea, but my experience is that some (many?) VTK readers are not happy with non-string based paths or direct data being passed in binary/string form. If PyVista can transform user input to what VTK expects, it makes sense to me, particularly if we do not have to add any dependencies. |
This is high up there on my wish list and I'm happy to help you make this happen in pyvsita! @MatthewFlamm makes a great point that we are mostly limited by what the upstream VTK readers can handle. Some native VTK readers support the def read_xml_from_s3(uri):
import pyvista as pv
import fsspec, s3fs
from vtkmodules import vtkIOXML
readers = {
"vti": vtkIOXML.vtkXMLImageDataReader,
"vts": vtkIOXML.vtkXMLStructuredGridReader,
"vtr": vtkIOXML.vtkXMLRectilinearGridReader,
"vtu": vtkIOXML.vtkXMLUnstructuredGridReader,
"vtp": vtkIOXML.vtkXMLPolyDataReader,
}
fs = fsspec.filesystem('s3')
ext = uri.split('.')[-1]
try:
reader = readers[ext]()
except KeyError:
raise KeyError(f"Extension {ext} is not supported for reading from S3")
reader.ReadFromInputStringOn()
with fs.open(uri, 'rb') as f:
reader.SetInputString(f.read())
reader.Update()
return pv.wrap(reader.GetOutput()) import pyvista as pv
mesh = read_xml_from_s3("s3://pyvista/examples/nefertiti.vtp") However, we can't do this for any other VTK readers as far as I am aware, leaving us with needing to write to a temporary file for formats like OBJ. Generally in my experience this is fine (just maybe don't do this for massive datasets). So perhaps a full solution is just some sort of helper routine like the following if the data path/URI is an def read_from_s3(uri):
"""Read any mesh file from S3."""
import os
import pyvista as pv
import fsspec, s3fs
import tempfile
fs = fsspec.filesystem('s3')
basename = os.path.basename(uri)
with tempfile.NamedTemporaryFile(suffix=basename) as tmpf:
with fs.open(uri, 'rb') as rf, open(tmpf.name, 'wb') as wf:
wf.write(rf.read())
return pv.read(tmpf.name) import pyvista as pv
mesh = read_from_s3("s3://pyvista/examples/nefertiti.obj") |
Hey @banesullivan, thank you for the detailed write up! I’m new to pyvista and 3D data like this in general, but given I had a need to read data from S3 I thought I’d use this as an opportunity to learn more about it. I thought I’d write up a short summary of what I’ve found so far this morning, and if you have the capacity I’d love some guidance on what to look at next. I'm not trying to put any obligation on you here, please feel free to totally ignore this comment Naive summary of PyvistaPyvista is a Pythonic interface to VTK. Under the hood it makes use of many readers written in the core VTK project. e.g. this CGNSReader class is "just" a wrapper around this class. Very few of these (as you listed) support being passed the file contents directly, and instead want a filepath that they themselves load from. Pyvista also makes use of Approach for introducing fsspec/remote file readingBased on the structure of When trying this diff: def read_meshio(filename, file_format=None):
# ...
try:
import meshio
except ImportError: # pragma: no cover
raise ImportError("To use this feature install meshio with:\n\npip install meshio")
- # Make sure relative paths will work
- filename = str(Path(str(filename)).expanduser().resolve())
- # Read mesh file
- mesh = meshio.read(filename, file_format)
+ with fsspec.open(filename, 'rb') as f:
+ mesh = meshio.read(f, filename.ext[1:] if file_format is None else file_format)
return from_meshio(mesh) Running Investigating this shows that meshio's From my uninformed perspective this looks like a bug, but I'm aware of how little context I have of this domain and usecase. It also made me doubt the feasibility of me making a "simple" change that would facilitate trasparent reading of Thinking of how to continueGiven your comment about how only a subset of readers would support being passed through and your provided snippets, would you prefer:
def read_remote_data(remote_uri):
if remote_uri.file_extension in LIST_OF_SUPPORTED_READERS:
... # fssspec.open(), reader.SetInputString() etc.
else:
... # copy file to local tmpdir and read in from there |
The # Import intern (pip install intern)
from intern import array
# Save a cutout to a numpy array in ZYX order:
channel = array("bossdb://MaherBriegel2023/Lgn200/sbem")
data = channel[30:36, 1024:2048, 1024:2048] See the implementation code for |
Describe the feature you would like to be added.
I would like to be able to easily read files direct from an remote filestore, such as S3.
Links to VTK Documentation, Examples, or Class Definitions.
Currently, the definition of
read()
makes strong assumptions that the objects to be loaded live on a local filesystem.Pseudocode or Screenshots
I'd love to be able to either:
s3://
and have pyvista know to use fsspec/s3fsread
methodI'd be super happy to work on bringing a PR to do this, if you would be open to merging this kind of a change in
The text was updated successfully, but these errors were encountered: