Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama.cpp server doesn't return grammar error messages when in streaming mode #7391

Open
richardanaya opened this issue May 19, 2024 · 4 comments
Assignees

Comments

@richardanaya
Copy link

When you run a streaming request against llama.cpp there appears to be no way to see error messages in the http response related to grammar errors.

@HanClinto
Copy link
Collaborator

@richardanaya Sorry for my delay in seeing this!

Great catch, and thank you for the excellent writeup in the ollama repo! Fantastic reporting, thank you! I particularly appreciated the annotated screenshots of what you're seeing. I have not yet dug through that whole thread in detail, but plan to read more later.

Over there, you wrote:

My conclusion from this given the advice of the community is that we do indeed have to do our our GBNF grammar validation on the Go server side to do our best at preventing passing down bad grammar.

I've spent a bit of time (#5948, #5950, #6004) improving the GBNF grammar validation within llama.cpp, and I would hate to see this work be re-implemented. Building a separate GBNF validator is non-ideal, since any changes to the functionality of the base (such as extensions to the grammar from #6640 or #6467) would need to be duplicated in each other implementation. That's the path I originally started going down, but I ultimately chose the path in #5948) so that we would have a validation program that would always be in lockstep with master.

But that said, I would love to better-support the server. As noted in this comment), at the time that I started working on this, I wasn't entirely sure how to best provide this error-checking to the CLI or other applications (such as the server).

In an ideal world, what would be the behavior of Llama.cpp (for server and / or CLI) in the case of invalid grammar? I think I was the last one to touch the grammar validation code, and am happy to implement the changes you need. Knowing what you would like to see would help me implement this. I think that adding grammar support to ollama would be a huge quality of life improvement for a lot of people -- I want to do whatever I can to make this happen!

@HanClinto HanClinto self-assigned this Jun 6, 2024
@richardanaya
Copy link
Author

richardanaya commented Jun 6, 2024

@HanClinto hey! thanks for the response. I 100% agree with you, I would love to just use llama.cpp too. The biggest issue seems to be short-circuiting execution of tasks (without ejecting the in memory model) during "streaming=true" mode. Any kind of 400 error would do :)

if that's not easy, having some kind of /grammar validator endpoint to check would be nice too I could POST to

@HanClinto
Copy link
Collaborator

The biggest issue seems to be short-circuiting execution of tasks (without ejecting the in memory model) during "streaming=true" mode. Any kind of 400 error would do :)

Sounds pretty reasonable. Looks like 422 might be a reasonable error to return in the case of an invalid grammar?

if that's not easy, having some kind of /grammar validator endpoint to check would be nice too I could POST to

Could definitely add something like that as well. Might be nice to expose the gbnf-validator functionality via such an endpoint.

I haven't messed around with the streaming mode of the server very much -- do you happen to have a sample script handy that I can use for testing? If not, I'm happy to figure it out and write something up in Python or whatever, but figured it was worth it for me to ask. If it's not self-contained / not easy-to-share, then don't worry about it.

Out of curiosity, do you happen to know what the behavior is like when streaming is set to false -- is that more sane behavior? If so, then that might be able to function as your /grammar validator endpoint for the time being.

@HanClinto
Copy link
Collaborator

Hmm, reading through the server documentation it appears that 400 is already reserved for an invalid grammar error:

*When the server receives invalid grammar via /completions endpoint

{
    "error": {
        "code": 400,
        "message": "Failed to parse grammar",
        "type": "invalid_request_error"
    }
}

I'm reading through the server code and not entirely sure why this response code of 400 wouldn't be returned when streaming = true -- I still need an easier test setup but I'd like to dig to the bottom of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants