-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: Issues with Applying LoRA in vllm on a T4 GPU
bug
Something isn't working
#5199
opened Jun 2, 2024 by
rikitomo
[Usage]: how to use the gpu_cache_usage_perc as a custom metric in k8s HPA?
usage
How to use vllm
#5195
opened Jun 2, 2024 by
chakpongchung
[Usage]: How can I deploy llama3-70b on a server with 8 3090 GPUs with lora and CUDA graph.
usage
How to use vllm
#5193
opened Jun 2, 2024 by
AlphaINF
[Bug]: vLLM api_server.py when using with prompt_token_ids causes error.
bug
Something isn't working
#5186
opened Jun 1, 2024 by
TikZSZ
[Feature]: MoE kernels (Mixtral-8x22B-Instruct-v0.1) are not yet supported on CPU only ?
feature request
#5185
opened Jun 1, 2024 by
xxll88
[Bug]: Offline Inference with the OpenAI Batch file format yields unnecessary Something isn't working
asyncio.exceptions.CancelledError
bug
#5182
opened Jun 1, 2024 by
jlcmoore
[Feature]: BERT models for embeddings
good first issue
Good for newcomers
new model
Requests to new models
#5179
opened Jun 1, 2024 by
mevince
[Bug]: Model Launch Hangs with 16+ Ranks in vLLM
bug
Something isn't working
#5170
opened May 31, 2024 by
wushidonguc
[Performance]: What can we learn from OctoAI
performance
Performance-related issues
#5167
opened May 31, 2024 by
hmellor
[Bug]: Unable to Use Prefix Caching in AsyncLLMEngine
bug
Something isn't working
#5162
opened May 31, 2024 by
kezouke
[Bug]: WSL2(Including Docker) 2 GPU problem --tensor-parallel-size 2
bug
Something isn't working
#5161
opened May 31, 2024 by
goodmaney
[Feature]: Linear adapter support for Mixtral
feature request
#5155
opened May 31, 2024 by
DhruvaBansal00
[Bug]: The openai deployment model takes twice as long to deploy as fastapi's approach to offline inference.
bug
Something isn't working
#5154
opened May 31, 2024 by
LIUKAI0815
[Bug]: CUDA illegal memory access when calling flash_attn_cuda.fwd_kvcache
bug
Something isn't working
#5152
opened May 31, 2024 by
khluu
[Bug]: torch.cuda.OutOfMemoryError: CUDA out of memory when Handle inference requests
bug
Something isn't working
#5147
opened May 31, 2024 by
zhaotyer
[Usage]: how should I do data parallelism using vLLM?
usage
How to use vllm
#5143
opened May 30, 2024 by
YuWang916
[Bug]: nsys cannot track the cuda kernel called by the process except rank 0
bug
Something isn't working
#5132
opened May 30, 2024 by
crazy-JiangDongHua
[Feature]: How to Enable VLLM to Work with PreTrainedModel Objects in my MOE-LoRA? THX
feature request
#5128
opened May 30, 2024 by
zhaofangtao
[Usage]: extractive question answering using VLLM
usage
How to use vllm
#5126
opened May 30, 2024 by
suryavan11
[New Model]: LLaVA-NeXT-Video support
new model
Requests to new models
#5124
opened May 30, 2024 by
AmazDeng
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.