-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add OLLAMA_LOAD_TIMEOUT
env variable
#4123
Feat: Add OLLAMA_LOAD_TIMEOUT
env variable
#4123
Conversation
} | ||
} | ||
print(timeout) | ||
expiresAt := time.Now().Add(time.Duration(timeout) * time.Second) // be generous with timeout, large models can take a while to load | ||
ticker := time.NewTicker(50 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we print the message "loading the model" for each tick?
Without the message or a spinning, the compute seems being stuck for 10 mins.
This is a killer feature for users who try to build things using only low end cheap GPUs. Please review and merge this for the next release, for the sake of common people. |
Thinking about this one more... the large timeout, and making it configurable is really more of a workaround. What we really should do is detect progress during model load and detect stalling more quickly. As long as we're making forward progress, we should let it proceed, perhaps indefinitely, but if we stall, then we should fail faster than 10m. I've started to lay some initial foundation in #4547 which I think will help us get to a better user experience overall. |
Thanks for putting this together. I think we can close this now that I've merged #4157. Please give it a shot and if you see any flakiness we can adjust the stall duration. |
Thanks for the heads-up, will definitely give it a try. |
Closes #3940
For certain hardware setups and models, the offloading to the GPU can take a lot of time and the user can hit a timeout. This PR makes the timeout configurable via the
OLLAMA_LOAD_TIMEOUT
env variable, to be provided in seconds.@dhiltgen I added a subsection in the FAQ, since I was not sure where to document the env variable. Let me know if this is the right place.