Adds streaming option to generate #1424

rasbt · 2024-05-16T12:46:14Z

Fixes #1420

rasbt · 2024-05-16T12:49:31Z

Is this what you were looking for @williamFalcon ?

williamFalcon · 2024-05-16T17:26:26Z

nice!

can you test it with this Studio
https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-8b-api

rasbt · 2024-05-17T16:37:57Z

It should work now @williamFalcon .

I modeled the adjusted litgpt serve, which optionally uses the stream=True option in generate after this example in LitServe. Since I am not super familiar with LitServe, could you double-check that this is indeed the correct use @aniketmaurya and @lantiga ?

litgpt/deploy/serve.py

rasbt · 2024-05-17T20:09:04Z

Many thanks for the feedback @aniketmaurya ! Does the current code look better?

carmocca

Why not import and use the chat.py generation implementation which already implements streaming? Instead of modifying the generate.py file

aniketmaurya

LGTM! added few comments.

Also please take a latest pull of LitServe, last week we merged some PR which affects streaming.

litgpt/generate/base.py

litgpt/deploy/serve.py

rasbt · 2024-05-20T13:56:29Z

@carmocca That's a good question, I am not sure why I was so fixated on generate. Let me see (in a separate PR) if I can make this serving example work with chat, then we don't need to modify generate.

Thanks for the feedback, @aniketmaurya !

Also please take a latest pull of LitServe, last week we merged some PR which affects streaming.

I could try that internally for testing but we'd probably have to wait until the next release. But could you ping me when the latest LitServe comes out so that I can bump the version in LitGPT?

rasbt · 2024-05-20T19:42:30Z

Thanks for all your help with LitServe so far @aniketmaurya . It's probably something stupid on my part, but with the recent LitServe version from GitHub main somehow it's still not working right

I.e. with

litgpt  serve --checkpoint_dir checkpoints/EleutherAI/pythia-1b --stream false

response = requests.post(
        "http://127.0.0.1:8000/predict",
         json={"prompt": "Fix typos in the following sentence: Exampel
         input"}
 )

The response.content comes out as

--stream true

b'{"output": " file"}{"output": ":"}{"output": " /"}{"output": "usr"}{"output": "/"}{"output": "share"}{"output": "/"}{"output": "X"}{"output": "11"}{"output": "/"}{"output": "x"}{"output": "org"}{"output": "."}{"output": "conf"}{"output": "."}{"output": "d"}{"output": "/"}{"output": "x"}{"output": "org"}{"output": "."}{"output": "conf"}{"output": "."}{"output": "7"}{"output": "."}{"output": "idx"}{"output": "."}{"output": "gz"}{"output": ":"}{"output": " error"}{"output": " opening"}{"output": " /"}{"output": "usr"}{"output": "/"}{"output": "share"}{"output": "/"}{"output": "X"}{"output": "11"}{"output": "/"}{"output": "x"}{"output": "org"}{"output": "."}{"output": "conf"}{"output": "."}{"output": "d"}{"output": "/"}{"output": "x"}{"output": "org"}{"output": "."}{"output": "conf"}{"output": "."}'

whereas --stream false works as expected via LitServe (pls ignore the bad Pythia response, I just used a small model here for demoing:

b'{"output":"Fix typos in the following sentence: Exampel input.\\n\\nThis book is now available as a Kindle app.\\n\\nPlease go to the print page to view the rest of this book.\\n\\n## **Contents**\\n\\n**1** The Problem\\n\\n**2** How We Lear"}'

I think it's something in my LitServe implementation because the generate with stream=True works as intended (e.g., via litgpt generate base --checkpoint_dir checkpoints/EleutherAI/pythia-1b --stream true)

If you have some time to take another look, that'd be really appreciated.

aniketmaurya · 2024-05-20T20:21:37Z

Hi @rasbt , we will have to iterate through the response in streaming mode like:

import requests

url = "http://127.0.0.1:8000/predict"

resp = requests.post(url, json={"prompt": "Fix typos in the following sentence: Exampel input"}, headers=None,
                     stream=True)
for line in resp.iter_lines():
    if line:
        print(line.decode("utf-8"))

when we don't iterate and use response.content, requests will concat everything in a bytestring.

rasbt · 2024-05-20T21:07:13Z

That makes sense! Thanks for your patience here @aniketmaurya , I know almost nothing about web stuff and serving. The code you shared does make sense, but yeah, it's still only generating a single token. Something must still be incorrect somewhere.

However, I also wonder if there's maybe a LitServe issue. I tried a simpler example:

import litserve as ls
import json


class SimpleStreamAPI(ls.LitAPI):
    def setup(self, device) -> None:
        self.model = lambda x: str(x)

    def decode_request(self, request):
        return request["input"]

    def predict(self, x):
        for i in range(10):
            yield self.model(i)

    def encode_response(self, output_stream):
        for output in output_stream:
            yield {"output": output}


if __name__ == "__main__":
    server = ls.LitServer(SimpleStreamAPI(), stream=True)
    server.run(port=8000)

and then via

import requests

url = "http://127.0.0.1:8000/predict"

resp = requests.post(url, json={"input": "1, 2, 3"}, headers=None, stream=True)
for line in resp.iter_lines():
    if line:
        print(line.decode("utf-8"))

it just returns ["{\"output\": \"0\"}"], which doesn't seem right. I would expect it to return 10 things due to the for-loop in predict, or am misunderstanding this?

EDIT: Nevermind, I pulled the latest LitServe version and it's fixed!

rasbt · 2024-05-20T22:13:44Z

Ok great, this works flawlessly with the new LitServe version now:

url = "http://127.0.0.1:8000/predict"

resp = requests.post(url, json={"prompt": "Hello world"}, headers=None, stream=True)
for line in resp.iter_lines():
    if line:
        print(line.decode("utf-8"))

{"output": "\n"}{"output": "\n"}{"output": "It"}{"output": " seems"}{"output": " that"}{"output": " I"}{"output": " can"}{"output": "'t"}{"output": " load"}{"output": " my"}{"output": " php"}{"output": " page"}{"output": " from"}{"output": " my"}{"output": " html"}{"output": " file"}{"output": "."}{"output": "\n"}{"output": "\n"}{"output": "A"}{"output": ":"}{"output": "\n"}{"output": "\n"}{"output": "1"}{"output": ")"}{"output": " Check"}{"output": " this"}{"output": " link"}{"output": ","}{"output": " it"}{"output": " might"}{"output": " solve"}{"output": " your"}{"output": " problem"}{"output": "."}{"output": "\n"}{"output": "2"}{"output": ")"}{"output": "Check"}{"output": " this"}{"output": " article"}{"output": "."}{"output": "\n"}{"output": "3"}{"output": ")"}{"output": " check"}{"output": " your"}{"output": " php"}{"output": " file"}{"output": " path"}

rasbt · 2024-05-20T22:52:28Z

I now also got it to work with the chat version via #1426. I guess we can close this one then and leave the base generate function untouched.

Or alternatively, the question is if we should replace the generate function chat/base.py with this one here that can handle both streaming and non-streaming @carmocca ? In the end, it would be less code to maintain (only one instead of two generate functions).

carmocca · 2024-05-22T09:46:21Z

I didn't merge them before because the logic to check for the stop tokens is not as performant as simply generating a fixed number of tokens. They could probably still be reconcilliated with some if-elses but code simplicity over deduplication was desirable before

rasbt · 2024-05-23T22:21:49Z

Makes sense and no worries. I'd say let's keep it as is. I got the LitServe streaming example to work after all, so mission accomplished 😊

adds streaming option to generate

7573b9f

rasbt requested review from awaelchli, carmocca and lantiga as code owners May 16, 2024 12:46

update

f5f6dcc

rasbt added 6 commits May 17, 2024 02:42

refinements and text fixes

87fd97a

fix litserve

d6acf98

Delete client.py

ea895cc

add missing tokens, add test

e952f80

cosmetics

089345c

add streaming support in litgpt serve

c890334

rasbt added the enhancement New feature or request label May 17, 2024

aniketmaurya reviewed May 17, 2024

View reviewed changes

litgpt/deploy/serve.py Outdated Show resolved Hide resolved

aniketmaurya reviewed May 17, 2024

View reviewed changes

litgpt/deploy/serve.py Outdated Show resolved Hide resolved

aniketmaurya reviewed May 17, 2024

View reviewed changes

litgpt/deploy/serve.py Outdated Show resolved Hide resolved

streamline usage

e981528

carmocca reviewed May 20, 2024

View reviewed changes

aniketmaurya requested changes May 20, 2024

View reviewed changes

litgpt/generate/base.py Show resolved Hide resolved

litgpt/deploy/serve.py Outdated Show resolved Hide resolved

updates

5d505c7

rasbt mentioned this pull request May 20, 2024

Streaming for serving with chat's generate function #1426

Open

rasbt closed this May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds streaming option to generate #1424

Adds streaming option to generate #1424

rasbt commented May 16, 2024

rasbt commented May 16, 2024

williamFalcon commented May 16, 2024

rasbt commented May 17, 2024

rasbt commented May 17, 2024

carmocca left a comment

aniketmaurya left a comment •

edited

rasbt commented May 20, 2024

rasbt commented May 20, 2024

aniketmaurya commented May 20, 2024

rasbt commented May 20, 2024 •

edited

rasbt commented May 20, 2024

rasbt commented May 20, 2024

carmocca commented May 22, 2024

rasbt commented May 23, 2024

Adds streaming option to generate #1424

Adds streaming option to generate #1424

Conversation

rasbt commented May 16, 2024

rasbt commented May 16, 2024

williamFalcon commented May 16, 2024

rasbt commented May 17, 2024

rasbt commented May 17, 2024

carmocca left a comment

Choose a reason for hiding this comment

aniketmaurya left a comment • edited

Choose a reason for hiding this comment

rasbt commented May 20, 2024

rasbt commented May 20, 2024

aniketmaurya commented May 20, 2024

rasbt commented May 20, 2024 • edited

rasbt commented May 20, 2024

rasbt commented May 20, 2024

carmocca commented May 22, 2024

rasbt commented May 23, 2024

aniketmaurya left a comment •

edited

rasbt commented May 20, 2024 •

edited