Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert directly from llama3 #4268

Open
wants to merge 8 commits into
base: mxyng/fix-quantize
Choose a base branch
from
Open

Conversation

pdevine
Copy link
Contributor

@pdevine pdevine commented May 8, 2024

This change allows you to convert directly from a llama3 derived safetensors model into Ollama.

It is currently missing:

  • pytorch almost works however the embeddings layer size is off by the eos/bos tokens

This will work with some llama3 derivatives if they are using safetensors including dolphin-2.9-llama3.

@mxyng mxyng force-pushed the pdevine/llama3 branch 8 times, most recently from 9b83ecb to 27588a7 Compare May 16, 2024 23:53
@mxyng mxyng changed the base branch from main to mxyng/cache-intermediate-layers May 17, 2024 18:38
@mxyng mxyng force-pushed the mxyng/cache-intermediate-layers branch from 39efb30 to 8d807d7 Compare May 17, 2024 18:38
@mxyng mxyng force-pushed the mxyng/cache-intermediate-layers branch from 8d807d7 to 0aba2d5 Compare May 17, 2024 18:40
@mxyng mxyng changed the base branch from mxyng/cache-intermediate-layers to mxyng/fix-quantize May 17, 2024 18:48
@mxyng mxyng marked this pull request as ready for review May 18, 2024 07:13
@mxyng
Copy link
Contributor

mxyng commented May 18, 2024

Updated the safetensors and pytorch conversion interfaces to take F32, F16, and BF16 inputs. This allows this change to convert llama3 derivatives such as nvidia's ChatQA and NousResearch's Hermes 2 Pro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants