Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in the 'deepseek-coder' chat template's system message #7385

Open
jukofyork opened this issue May 19, 2024 · 1 comment
Open

Comments

@jukofyork
Copy link
Contributor

I noticed this when adding the 'alpaca' chat template (which is very similar to the 'deepseek-coder' chat template):

#7383 (comment)

----- deepseek-ai/deepseek-coder-33b-instruct -----
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
<|begin▁of▁sentence|>You are a helpful assistant### Instruction:
Hello
### Response:
Hi there
<|EOT|>
### Instruction:
Who are you
### Response:
   I am an assistant   
<|EOT|>
### Instruction:
Another question
### Response:

Doesn't look correct to me for deepseek-coder as the default system message ends with a newline like so:

----- deepseek-ai/deepseek-coder-33b-instruct -----
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
<|begin▁of▁sentence|>You are a helpful assistant
### Instruction:
Hello
### Response:
Hi there
<|EOT|>
### Instruction:
Who are you
### Response:
   I am an assistant   
<|EOT|>
### Instruction:
Another question
### Response:

Perhaps the Python script on this page should detect this and move the newline from the default system message into the template itself?

@jukofyork
Copy link
Contributor Author

There may be another bug in this:

output = AutoTokenizer.from_pretrained(VARIANTS_TO_TEST[0]).apply_chat_template(history, tokenize=False, add_generation_prompt=True, chat_template=GEMMA_TMLP)

Should this be VARIANTS_TO_TEST[0] and thus getting the 'teknium/OpenHermes-2.5-Mistral-7B' data or has the vector been rearranged at some time and 'google/gemma-7b-it' was originally at the start of the vector?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant