Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support predibase LLM serving a base model with optional fine-tuned adapter. #369

Conversation

alexsherstinsky
Copy link
Contributor

@alexsherstinsky alexsherstinsky commented May 17, 2024

Title: Support predibase LLM serving a base model with an optional fine-tuned adapter.

  • Brief Description of Changes
  • We add "predibase" as the provider using the OpenAI-compliant API that Predibase has.
  • Format for supplying a fine-tuned adapter is "<base_model>[:adapter_id]", where adapter_id format is
    "<adapter_repository_reference/version_number" (version_number is required).
client = OpenAI(
    api_key=os.environ["PREDIBASE_API_TOKEN"],
    base_url=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        provider="predibase",
    )
)

chat_complete = client.chat.completions.create(
    user=os.environ["PREDIBASE_TENANT_ID"],
    # model=os.environ["PREDIBASE_DEPLOYMENT"],
    model=f'{os.environ["PREDIBASE_DEPLOYMENT"]}:test-phi-3/4',
    stream=False,  # True is also supported.
    max_tokens=128,
    temperature=0.2,
    messages=[
        {
            "role": "user",
            "content": "How fast can a horse run?",
        },
    ],
)

if isinstance(chat_complete, Stream):
    completion_stream: Stream = chat_complete
    text: list[str] = []
    for message in completion_stream:
        print(message)
        delta_content: str | None = message.choices[0].delta.content
        if delta_content:
            text.append(delta_content)
    print("".join(text))
else:
    print(chat_complete.choices[0])
    print(chat_complete.choices[0].message)
    print(chat_complete.choices[0].message.content)


Supported body parameters:

  model,
  messages,
  max_tokens,
  temperature,
  top_p,
  stream,
  n,
  stop,

Description: (optional)

  • Detailed change 1
  • Detailed change 2

Motivation: (optional)

  • Predibase becomes an option as an LLM provider for the users of Gateway. Predibase is the most robust platform for LLM fine-tuning and serving for a wide variety of pre-trained models, including the open-source ones.

Related Issues: (optional)

  • #issue-number

@alexsherstinsky alexsherstinsky marked this pull request as ready for review May 17, 2024 18:25
@alexsherstinsky alexsherstinsky changed the title [FEATURE] Support predibase LLM serving a base model (without fine-tuned adapter for now). [FEATURE] Support predibase LLM serving a base model (without a fine-tuned adapter for now). May 17, 2024
@alexsherstinsky alexsherstinsky marked this pull request as draft May 17, 2024 20:23
@alexsherstinsky alexsherstinsky changed the title [FEATURE] Support predibase LLM serving a base model (without a fine-tuned adapter for now). [FEATURE] Support predibase LLM serving a base model with optional fine-tuned adapter. May 18, 2024
@alexsherstinsky alexsherstinsky marked this pull request as ready for review May 18, 2024 15:21
@roh26it
Copy link
Collaborator

roh26it commented May 20, 2024

Looks good to me and excited to merge this.

@VisargD will run some tests at our end on this and come back if there's any changes.

@VisargD
Copy link
Collaborator

VisargD commented May 20, 2024

Hey @alexsherstinsky - Thanks for the PR! In the Predibase docs, I can see that streaming (/generate_stream) is also supported. Are you planning to add it to this PR as well? Here is the doc that I am referring to: https://docs.predibase.com/user-guide/inference/rest_api#notes

If its not planned for this PR then I can merge this and raise a new one with streaming support for Predibase.

You can also use the /generate_stream endpoint to have the tokens be streamed from the deployment. The parameters also follow the same format as the [LoRAX generate endpoints](https://github.com/predibase/lorax/tree/main/clients/python).

@alexsherstinsky
Copy link
Contributor Author

Hey @alexsherstinsky - Thanks for the PR! In the Predibase docs, I can see that streaming (/generate_stream) is also supported. Are you planning to add it to this PR as well? Here is the doc that I am referring to: https://docs.predibase.com/user-guide/inference/rest_api#notes

If its not planned for this PR then I can merge this and raise a new one with streaming support for Predibase.

You can also use the /generate_stream endpoint to have the tokens be streamed from the deployment. The parameters also follow the same format as the [LoRAX generate endpoints](https://github.com/predibase/lorax/tree/main/clients/python).

@VisargD Streaming is already supported! If you see in my example above, there is a stream flag -- I test it, and it works! Thank you!

…m_serving_with_fine_tuned_adapters-2024_04_18-0
@VisargD
Copy link
Collaborator

VisargD commented May 21, 2024

Gateway expects separate responseTransforms for stream and non-stream mode. In the index.ts of a provider, we define chatComplete and stream-chatComplete responseTransforms. Just like you defined PredibaseChatCompleteResponseTransform function, there should be one more function like PredibaseChatCompleteStreamChunkTransform which maps provider's stream chunk to Gateway compatible stream chunk. You can check other providers like perplexity-ai, mistral-ai to get an idea of this.

The reason why stream is working fine currently is because it is not able to find a stream chunk transform function and so it is doing a passthrough of all the chunks as it is. Even if predibase is sending OpenAI compatible chunks, its preferred to atleast add this function and map the chunk data as it is so that in future it does not break whenever predibase makes any change.

Please let me know if you need any help with this. I can provide more details if required.

@VisargD
Copy link
Collaborator

VisargD commented May 21, 2024

Here is what I am suggesting:

  • Add a new stream chunk transform function in chatComplete.ts:
    Predibase does not send id in chunks. So you can use fallbackId which is passed to all transform function calls.
export const PredibaseChatCompleteStreamChunkTransform: (
  response: string,
  fallbackId: string,
) => string | string[] = (responseChunk, fallbackId) => {
  let chunk = responseChunk.trim();
  chunk = chunk.replace(/^data:/, '');
  chunk = chunk.trim();

  const parsedChunk: PredibaseChatCompletionStreamChunk = JSON.parse(chunk);

  return `data: ${JSON.stringify({
      id: fallbackId,
      object: parsedChunk.object,
      created: Math.floor(Date.now() / 1000),
      model: parsedChunk.model,
      provider: PREDIBASE,
      choices: [
        {
          delta: {
            role: parsedChunk.choices[0]?.delta?.role,
            content: parsedChunk.choices[0]?.delta?.content,
          },
          index: 0,
          finish_reason: parsedChunk.choices[0]?.finish_reason,
        },
      ],
    })}` + '\n\n';
};
  • Add stream-chatComplete responseTransform in predibase index.ts:
const PredibaseConfig: ProviderConfigs = {
  chatComplete: PredibaseChatCompleteConfig,
  api: PredibaseAPIConfig,
  responseTransforms: {
    chatComplete: PredibaseChatCompleteResponseTransform,
    'stream-chatComplete': PredibaseChatCompleteStreamChunkTransform,
  },
};

@alexsherstinsky
Copy link
Contributor Author

Here is what I am suggesting:

  • Add a new stream chunk transform function in chatComplete.ts:
    Predibase does not send id in chunks. So you can use fallbackId which is passed to all transform function calls.
export const PredibaseChatCompleteStreamChunkTransform: (
  response: string,
  fallbackId: string,
) => string | string[] = (responseChunk, fallbackId) => {
  let chunk = responseChunk.trim();
  chunk = chunk.replace(/^data:/, '');
  chunk = chunk.trim();

  const parsedChunk: PredibaseChatCompletionStreamChunk = JSON.parse(chunk);

  return `data: ${JSON.stringify({
      id: fallbackId,
      object: parsedChunk.object,
      created: Math.floor(Date.now() / 1000),
      model: parsedChunk.model,
      provider: PREDIBASE,
      choices: [
        {
          delta: {
            role: parsedChunk.choices[0]?.delta?.role,
            content: parsedChunk.choices[0]?.delta?.content,
          },
          index: 0,
          finish_reason: parsedChunk.choices[0]?.finish_reason,
        },
      ],
    })}` + '\n\n';
};
  • Add stream-chatComplete responseTransform in predibase index.ts:
const PredibaseConfig: ProviderConfigs = {
  chatComplete: PredibaseChatCompleteConfig,
  api: PredibaseAPIConfig,
  responseTransforms: {
    chatComplete: PredibaseChatCompleteResponseTransform,
    'stream-chatComplete': PredibaseChatCompleteStreamChunkTransform,
  },
};

@VisargD Thank you very much for this -- it was extremely helpful! I incorporated your suggestions and looked up how perplexity-ai does it as well. Thanks to your suggestion, I already found one error (one of my tests is failing, which is a good thing, because it is happening now, while we are still developing it!). I will ping you again once I have figured it out and made the fix. Thanks again!

@alexsherstinsky
Copy link
Contributor Author

@VisargD Please re-review; I incorporated your suggestion and also added error handling. The error handling this way enables the client to see the actual error; otherwise, the error response does not work, because the HTTP response is 200 OK. Thank you.

@VisargD
Copy link
Collaborator

VisargD commented May 22, 2024

Thanks for the quick changes. Looks good to me! I will merge this PR and make it a part of the next gateway release.

@VisargD
Copy link
Collaborator

VisargD commented May 22, 2024

Closes #126

@VisargD VisargD linked an issue May 22, 2024 that may be closed by this pull request
@VisargD VisargD merged commit 079b463 into Portkey-AI:main May 22, 2024
1 check passed
@alexsherstinsky alexsherstinsky deleted the feature/alexsherstinsky/support_predibase_llm_serving_with_fine_tuned_adapters-2024_04_18-0 branch May 22, 2024 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Provider] Predibase
3 participants