vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.8k
Star 20.3k

Code
Issues 848
Pull requests 262
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q2 2024

#3861 opened Apr 4, 2024 by simon-mo

Open 29

Virtual Office Hours: Jun 5 and Jun 20

#4919 opened May 20, 2024 by robertgshaw2-neuralmagic

Open

v0.4.3 Release Tracker

#4895 opened May 18, 2024 by simon-mo

Open 13

Labels 41 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

848 Open 2,041 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug]: Issues with Applying LoRA in vllm on a T4 GPU bug

Something isn't working

#5199 opened Jun 2, 2024 by rikitomo

[Usage]: how to use the gpu_cache_usage_perc as a custom metric in k8s HPA? usage

How to use vllm

#5195 opened Jun 2, 2024 by chakpongchung

[Usage]: How can I deploy llama3-70b on a server with 8 3090 GPUs with lora and CUDA graph. usage

How to use vllm

#5193 opened Jun 2, 2024 by AlphaINF

[Bug]: loading squeezellm model bug

Something isn't working

#5190 opened Jun 2, 2024 by yuhuixu1993

[Bug]: vLLM api_server.py when using with prompt_token_ids causes error. bug

Something isn't working

#5186 opened Jun 1, 2024 by TikZSZ

[Feature]: MoE kernels (Mixtral-8x22B-Instruct-v0.1) are not yet supported on CPU only ? feature request

#5185 opened Jun 1, 2024 by xxll88

[Bug]: Offline Inference with the OpenAI Batch file format yields unnecessary asyncio.exceptions.CancelledError bug

Something isn't working

#5182 opened Jun 1, 2024 by jlcmoore

[Feature]: BERT models for embeddings good first issue

Good for newcomers

new model

Requests to new models

#5179 opened Jun 1, 2024 by mevince

[Usage]: Prefix caching in VLLM usage

How to use vllm

#5176 opened Jun 1, 2024 by Abhinay2323

[Bug]: Model Launch Hangs with 16+ Ranks in vLLM bug

Something isn't working

#5170 opened May 31, 2024 by wushidonguc

[Performance]: What can we learn from OctoAI performance

Performance-related issues

#5167 opened May 31, 2024 by hmellor

[Bug]: Unable to Use Prefix Caching in AsyncLLMEngine bug

Something isn't working

#5162 opened May 31, 2024 by kezouke

[Bug]: WSL2(Including Docker) 2 GPU problem --tensor-parallel-size 2 bug

Something isn't working

#5161 opened May 31, 2024 by goodmaney

[Feature]: Linear adapter support for Mixtral feature request

#5155 opened May 31, 2024 by DhruvaBansal00

[Bug]: The openai deployment model takes twice as long to deploy as fastapi's approach to offline inference. bug

Something isn't working

#5154 opened May 31, 2024 by LIUKAI0815

[Bug]: CUDA illegal memory access when calling flash_attn_cuda.fwd_kvcache bug

Something isn't working

#5152 opened May 31, 2024 by khluu

[Misc]: Should inference with temperature 0 generate the same results for a lora adapter and equivalent merged model? misc

#5148 opened May 31, 2024 by rohan-daniscox

[Bug]: torch.cuda.OutOfMemoryError: CUDA out of memory when Handle inference requests bug

Something isn't working

#5147 opened May 31, 2024 by zhaotyer

[Usage]: how should I do data parallelism using vLLM? usage

How to use vllm

#5143 opened May 30, 2024 by YuWang916

[Bug]: nsys cannot track the cuda kernel called by the process except rank 0 bug

Something isn't working

#5132 opened May 30, 2024 by crazy-JiangDongHua

[Feature]: How to Enable VLLM to Work with PreTrainedModel Objects in my MOE-LoRA? THX feature request

#5128 opened May 30, 2024 by zhaofangtao

[Feature]: Triton GPTQ feature request

#5127 opened May 30, 2024 by double-vin

[Usage]: extractive question answering using VLLM usage

How to use vllm

#5126 opened May 30, 2024 by suryavan11

[New Model]: LLaVA-NeXT-Video support new model

Requests to new models

#5124 opened May 30, 2024 by AmazDeng

[Bug]: The tail problem bug

Something isn't working

#5123 opened May 30, 2024 by ZixinxinWang

Previous 1 2 3 4 5 … 33 34 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly