-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR]: Use OpenCL instead privative alternatives (CUDA, Metal) #595
Comments
Read alicevision/AliceVision#439 |
I read the thread. Some commentaries are from 2018, and OpenCL 2.2 didn't exist, and many changes come from then. CUDA is used in many applications, but OpenCL too () . In that list is Darktable too, that I usually use. Anyway, Fabencastian wrote
That's a pity, cause lot of users could not try Meshroom, despite it's a great develop. I'm just now in the PC with the Intel GPU, so there is no way to use Meshroom and tried alternatives, like Metashape, that doesn't require necessarily and Nvidia GPU. |
That @fabiencastan does not have the time to do a port of a - for him working implementation - does not mean that other cannot implement it in their own time. A very big thing here is, would you implement it in OpenCL, or something different. Some good pointers on the wiki what are viable alternatives could help people that want to start on this task. |
Hi skinkie, I have no sufficient skills to code in C/C++. I'll give a try if it were Python, PHP or even JS. I point to the fact that "less users able to run an application = less interest in the application = less feedback" and finally, the great idea falls in an lost effort. It's true it's easier to work with the CUDA API, but a lot of users in this forum has reported info about how to migrate or simplify change to OpenCL. That could be a good point to start. That's only my opinion, of course. |
@RafaelLinux As user you can use Meshroom without CUDA, the only part of the application that is 'hidden' is the DepthMap stage and even that allows for preview without CUDA. As developer MeshRoom is Python + QML low entry level to make impact. The first acceleration CUDA is used in is the feature extraction. You could just try to get this to work: https://github.com/pierrepaleo/sift_pyocl Personally my focus for Meshroom is introducing some heuristics for matching images and supervised learning opposed to the current brute force approach. Not that I am a photogrammetry specialist, but I can surely try to work on this open source project. |
Maybe I'm using incorrectly Meshroom, cause if I only reach DepthMap, I only see a cloud of points, so I can see the model result. |
Thank you, is a good workaround. I ll try it. Anyway, remember users don't mind how long it takes, quality is the priority, so please, don't forget this feature request ;) |
One could also use hipfy from AMD to convert CUDA code to HIP, wich can be built to work on either NVIDIA or AMD cards (with very nice performance, I currently use it for Tensorflow, and it works like a charm !) |
@aviallon The last time (2018) hip did not support some cuda functions alicevision/AliceVision#439 (comment) You are welcome to try again using hipfy. |
for reference https://github.com/cpc/hipcl |
This is interesting, have anyone tried it? |
This is simply a packaging issue since Arch has CUDA despite being not in the list here. You already reported that issue to both, the open SUSE packagers and the NVidea CUDA team? And you can probably repackage either the 15.0 variant of openSUSE package or the Arch package, which uses an independent source, as you can see in the link. |
@ShalokShalom the problem with Cuda remains that older hardware absolutely does not work with newer CUDA versions. This causes problems for nvidia-drivers and cuda, where one is effectively searching for the 'ideal pair' between them. I would be very interested if opencl could bridge this gap even by choosing the execution pipeline of choice. |
And how is that with HiP? Nvidia hardware runs on it as well? I consider using a Geforce GT 610 for CUDA, can you tell me how to choose the suitable CUDA version? Thanks a lot |
"HIP allows developers to convert CUDA code to portable C++. The same source code can be compiled to run on NVIDIA or AMD GPUs"
On Windows, install the latest version, on Linux this might depend on your Distro. GT 610 supports CUDA 2.1, MR requires 2+ |
I am on Linux, what decides which version is optimal? I am on KaOS, that is a rolling distribution. So, does HiP negligible the version differences between CUDA and the different NVidia hardware? Could or should we replace CUDA entirely with it or is the overhead to big? |
@ShalokShalom With HiP we can compile two versions of Meshroom: for CUDA and AMD GPUs. For CUDA users nothing changes. (https://kaosx.us/docs/nvidia/ But you won´t get far with a 1GB GT 610) |
We have to wait for HiP to support cudaMemcpy2DFromArray. Then we can add AMD support for AV/MR and try HiPCL. |
If Meshroom would allow parallel computation for nodes where both CPU and GPU could for example do feature extraction. Any additional computing resource could help. It depends on how much overhead the GPU would give in compare to a (faster) decent CPU but I would still see the potential for independent computation tasks. |
looks like hip supports now cudaMemcpy2DFromArray any progress on this? |
@arpu Yes, all CUDA functions are now supported by HiP and I was able to convert the code to HiP using the conversion tool (read here for details). The only thing left is to write a new cmake file that includes HiP and supports both CUDA and AMD compilation and the different platforms. Here is the Meshroom PopSift plugin I used for testing. At the moment I don´t have the time to figure out how to rewrite the cmake file, but I think @ShalokShalom wanted to look into this. |
One question is very critical, I think: Will we ship two versions? Linux distributions do their packaging themselves and we could benefit enormously by finding someone who is willing to maintain Alice for their userbase since that could result in new developers and funding. 2 versions, one for CUDA and one for HIP is something they will never do. |
@ShalokShalom from the HiP code we can compile both CUDA and AMD versions. Similar to the parameter target platform/os in the cmake, CUDA or AMD can be defined. So depending on the compiler parameters we can define the versions (OS+cuda/amd). |
Any idea how long that approximately takes? I feel like a child just before Christmas eve :D |
@PickUpYaAmmo I will take another look at this over the winter holidays. |
Hi guys, has there been any progress on this? |
As a thought experiment, if the functionality used by MeshRoom were rewritten using the CPU instead of GPU (if that is possible), how much slower would it be? My (little) understanding is that GPGPU basically lets you do massive parallel computation (and of course offload stuff from the CPU itself). If this were rewritten with say, loops, what would the slow-down be? |
My not well informed estimation is that the slowdown could be enormous. I'm pretty sure that many types of computations can run hundreds of times slower on CPU and it seems that Meshroom really makes full use of my GPU when doing what it does. I think it's pretty much the perfect kind of work to run on the GPU because it can be massively parallel, which means that losing that massive parallelism would slow it down a lot. But anybody feel free to correct me if I'm wrong, some of that is kind of guess/impression. |
You could eventually use both, as Blender does. |
We are looking into an alternative to the DepthMap node that runs on the CPU (it is not yet ready to use). It is not as good as the native DepthMap node quality wise, but better than DraftMeshing. I did some research and the best solution for porting is still HIP. I´ll continue to see if I am able to build a test version, but it will definitively take time as I am learning by trial and error and success is not guaranteed. |
This looks like an even better argument to switch to HIP: https://www.phoronix.com/scan.php?page=news_item&px=AMD-HIP-CPU-Implementation |
I think we also have to consider that AMD - with respect to compatibility - might be just as bad as nVidia. I just notice that ROCm started to drop support for GPU's that could still be consider very beneficial for our computation tasks. This basically means: even if we migrate to OpenCL you will see, that the GPU that you want to run it on, still needs to be "very recent" otherwise you will loose your compatibility. Feels sad though. |
@skinkie there is a difference between ROCm and OpenCL. If there is an OpenCL port, there won't be any such issue, since even very old devices receive updates from the opensource community. Sent from my MI 5S Plus using FastHub-Libre |
I just ran into a kernel panic over tesseract using opencl. While I agree generally with your statement, OpenCL may fall back to a CPU implementation, but as I just noticed, that is not a given thing. Even it worked gracefully, a CPU computation might render some operations useless or costing extreme amounts of time (and therefore: power). |
We could change the title? |
I apologize if this is a bit out of touch with the current direction of the conversion, but wanted to share nonetheless: |
Would that allow the program to use all interfaces simultaniously? (Read: the ability to schedule the tasks over multiple targets) |
This comment was marked as off-topic.
This comment was marked as off-topic.
My question was more in the direction, would the glue code take care of it ;) |
This comment was marked as off-topic.
This comment was marked as off-topic.
I don't think hip or SYCL would require the functionality on their own, if the intermediate would take care of it. Like starting a new thread on the CPU or GPU, anything that would be available. |
This comment was marked as off-topic.
This comment was marked as off-topic.
I really don't think this is offtopic. Meshroom splits a huge task in many smaller steps, but is then limited to a specific backend. Anything that would natively allows to schedule the task transparantly to cpu, gpu, etc. that would motivate people to integrate that technology sooner. |
Why would this be offtopic? 👀 |
I think this could be helpful: https://www.phoronix.com/news/Intel-SYCLomatic-20220829, |
SYCL or Vulkan Compute Shader can be the open solution (with a preference to SYCL). SYCL principles are pretty close to CUDA. |
a more practical suggestion here: Regard3D is another OpenMVG based photogrammetry solution. he wrote a densification procedure that doesn't need CUDA and is platform agnostic. He hasn't updated in a while. maybe you can add his densification module to meshroom. It's open source. https://github.com/rhiestan/Regard3D/tree/master also OpenMVS has a very nice platform independent densification module which works very well, which I've been using. |
Something to look into: https://github.com/vosen/ZLUDA
|
To those interested in the topic: could you please test Meshroom-compatible ZLUDA version? More info here: vosen/ZLUDA#79 (comment) |
For those of you who want to test it now: You can download a ready to use ZIP here if you prefer. I put it together to simplify testing. (It includes Meshroom 2023.3.0 and AliceVision+ZLUDA as provided by vosen. I added Run-Meshroom-ZLUDA.bat that hopefully works and ZLUDA-Info.txt with some info on ZLUDA from the git) It would be great if you could do some tests with the https://github.com/alicevision/dataset_monstree dataset (mini3 and full) so we can compare the performance. |
This just loads a webpage that says "Not found", response is 404. Perhaps it's only available to you? I only have a laptop with a 780M APU + RTX 4060 (laptop part not desktop, so weaker part AFAIK?), paired with a Ryzen 7940HS (8/16 core/threads @ 4GHz) and 32GB RAM. A 780M probably isn't ideal for an AMD GPU to test with? 🤷♂️ I might find time to give it a try with the dataset if you like, although I haven't done photogrammetry in a while, I only have about 30GB of disk to spare atm, if that's sufficient I can probably tackle it by next weekend 👍 |
@polarathene sorry, my bad. Link is fixed. Just give it a test run on your machines. 30gb should be more than enough to test with the monstree dataset. |
I just reported previosly the impossibility to render with Meshroom, probably cause despite I have an NVidia GPU, Nvidia does not provide any CUDA package for OpenSUSE 15.1 . I use Blender, GIMP ... all of them are using OpenCL. Meshroom is developed for Linux and Windows. OpenCL is updated continuously for both platforms. OpenCL performance is slightly under propietary Nvidia or AMD APIs, so, why do not let Meshroom to use OpenCL GPGPU API? Even Intel GPU users could use Meshroom if it uses OpenCL framework.
Please, could you consider this suggestion?
Thank you
The text was updated successfully, but these errors were encountered: