Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track allocated buffers in rpc-server #7407

Closed
rgerganov opened this issue May 20, 2024 · 2 comments
Closed

Track allocated buffers in rpc-server #7407

rgerganov opened this issue May 20, 2024 · 2 comments
Assignees

Comments

@rgerganov
Copy link
Collaborator

Currently the rpc-server doesn't perform any input validation and this may have security implications. Another problem is that it may create memory leaks if clients do not free allocated buffers before disconnect (PR #7378 tries to address this).

I think we can address both issues by using std::unordered_set to track allocated buffers and perform some additional checks without any noticeable performance degradation.

@rgerganov rgerganov self-assigned this May 20, 2024
@rgerganov rgerganov changed the title Track allocated buffer in rpc-server Track allocated buffers in rpc-server May 20, 2024
@chraac
Copy link

chraac commented May 20, 2024

Have some other thought maybe off topic:
Should we make a dedicated session liked structure for each client to organize the resource allocatd by connection? we can also hold ggml_backend_t there.

rgerganov added a commit to rgerganov/llama.cpp that referenced this issue May 20, 2024
@rgerganov
Copy link
Collaborator Author

rgerganov commented May 20, 2024

Should we make a dedicated session liked structure for each client to organize the resource allocatd by connection?

Right now the rpc-server can serve only one client at a time (I should add this to the README). I prefer to keep it that way because the code is simple as we don't have to deal with multiple threads, synchronization, etc. Users can still run multiple instances using the same backend and overcommit backend memory if this is what they want.

rgerganov added a commit that referenced this issue May 20, 2024
* rpc : track allocated buffers

ref: #7407

* rpc : pack rpc_tensor tightly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants