Skip to content

Commit

Permalink
Add special tokens to server llama_decode() inputs
Browse files Browse the repository at this point in the history
The llamafile server /embedding endpoint was returning embeddings that
were very inconsistent with llama.cpp. This is due to changes upstream
with tokenization. The upstream project was adding special tokens like
["[CLS]", " apples", " are", " red", " .", "[SEP]"] before running the
operation. We're now handling things more similar to upstream although
the llama.cpp server has diverged so much since removing LLaVA support
that they're very different pieces of software at this point.

Fixes #391
  • Loading branch information
jart committed May 4, 2024
1 parent 42bd9b8 commit 7900294
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 10 deletions.
17 changes: 13 additions & 4 deletions llama.cpp/main/main.1
Original file line number Diff line number Diff line change
Expand Up @@ -559,6 +559,19 @@ Print token count every
tokens.
.Pp
Default: -1
.It Fl Fl pooling Ar KIND
Specifies pooling type for embeddings. This may be one of:
.Pp
.Bl -dash -compact
.It
none
.It
mean
.It
cls
.El
.Pp
The model default is used if unspecified.
.El
.Sh CLI OPTIONS
The following options may be specified when
Expand Down Expand Up @@ -741,10 +754,6 @@ Path from which to serve static files.
.Pp
Default:
.Pa /zip/llama.cpp/server/public
.It Fl Fl embedding
Enable embedding vector output.
.Pp
Default: disabled
.It Fl Fl nobrowser
Do not attempt to open a web browser tab at startup.
.It Fl gan Ar N , Fl Fl grp-attn-n Ar N
Expand Down
14 changes: 9 additions & 5 deletions llama.cpp/main/main.1.asc
Original file line number Diff line number Diff line change
Expand Up @@ -537,6 +537,15 @@

Default: ‐1

--pooling KIND
Specifies pooling type for embeddings. This may be one of:

- none
- mean
- cls

The model default is used if unspecified.

CLI OPTIONS
The following options may be specified when llamafile is running in
--cli mode.
Expand Down Expand Up @@ -737,11 +746,6 @@
Default: /zip/llama.cpp/server/public
--embedding
Enable embedding vector output.
Default: disabled
--nobrowser
Do not attempt to open a web browser tab at startup.
Expand Down
2 changes: 1 addition & 1 deletion llama.cpp/server/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1783,7 +1783,7 @@ struct llama_server_context
}
else
{
prompt_tokens = tokenize(slot.prompt, system_prompt.empty() && add_bos_token); // add BOS if there isn't system prompt
prompt_tokens = tokenize(slot.prompt, system_prompt.empty()); // add BOS if there isn't system prompt
}

slot.num_prompt_tokens = prompt_tokens.size();
Expand Down

0 comments on commit 7900294

Please sign in to comment.