You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running WizardLM-2-8x22B the model loads into VRAM but then freezes at 100% GPU usage when attempting to process kv cache. The power draw is 100W/300W and stays like this until terminating the server.
Oddly Llama-3-70B works perfectly fine for the setup below, but fails for other kernels and ROCm versions.
OS: Ubuntu 22.04.4
Linux Kernel: 5.19.0-50-generic
Virtualization: Xen Hypervisor
GPU: x2 MI100
ROCm: 6.0.0
Llama.cpp/Server Version: Any
When switching to kernel 6.5 with ROCm 6.0 or 6.1 neither Llama-3-70B or WizardLM-2-8x22B work causing the 100% stall bug.
iommu=pt has no effect
GPU_MAX_HW_QUEUES=1 has no effect for any ROCm version or kernel
The text was updated successfully, but these errors were encountered:
When running WizardLM-2-8x22B the model loads into VRAM but then freezes at 100% GPU usage when attempting to process kv cache. The power draw is 100W/300W and stays like this until terminating the server.
Oddly Llama-3-70B works perfectly fine for the setup below, but fails for other kernels and ROCm versions.
When switching to kernel 6.5 with ROCm 6.0 or 6.1 neither Llama-3-70B or WizardLM-2-8x22B work causing the 100% stall bug.
The text was updated successfully, but these errors were encountered: