Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak? #83

Open
markg85 opened this issue May 8, 2024 · 7 comments
Open

memory leak? #83

markg85 opened this issue May 8, 2024 · 7 comments

Comments

@markg85
Copy link

markg85 commented May 8, 2024

Hi,

I was running a chat in a loop every 5 seconds.

setInterval(async () =>{ 
    const request = `... some long text ...`;

    const response = await ollama.chat({
        model: 'phi3:3.8b-mini-instruct-4k-q4_K_M',
        messages: [{ role: 'user', content: request }],
        options: {
            temperature: 0
        },
      })
    
      console.log(response.message.content)

}, 5000);

After a couple minutes i get:

(node:38490) MaxListenersExceededWarning: Possible EventTarget memory leak detected. 101 abort listeners added to [AbortSignal]. Use events.setMaxListeners() to increase limit

Note that 101 in there, it increases every 5 seconds too so i'm fairly sure this is caused by ollama-js.

You might be wondering, why run this in a loop?
Simple really, i have some data - that changes rapidly - that i want to ask a model about.
Am i doing something wrong here? Should i be using a different function?

@prunes-git
Copy link

Does the model reliably respond in under 5 seconds? based on your error message it looks like it may be taking longer and thus slowly accumulating listeners until it reaches it's max and throws those errors.

@markg85
Copy link
Author

markg85 commented May 13, 2024

It does reliably respond, yes, but that might take a little more then a couple seconds.

While i fixed it locally by only asking the models something once every minute (didn't need finer grained), it's still something that should probably be looked at.

@prunes-git
Copy link

prunes-git commented May 13, 2024

If you are making requests every 5 seconds, and it takes 6 seconds to respond then you lose a second each time, so after 5 requests there is two open listeners at all times, that will slowly build up over time, and if it takes 10 seconds to respond then it would have a additional open listener every 5 seconds, that is not anything that can be fixed anywhere but in your code, that is why it gives you the error message.

@markg85
Copy link
Author

markg85 commented May 13, 2024

I'm not so sure about that.
I just checked, again with the phi3 model.

The model responds within 1 second and i'm repeating the same question over and over again every 5 seconds.
On a cold start the model takes 2 seconds but i'm assuming that's just because of loading it into memory.

The code i'm using is exactly as in my initial post. Why don't you try it out and see for yourself.

The issue literally can't be in my code as i'm not creating any listeners. I'm only calling await ollama.chat which completes within 5 seconds. Thus on my side in terms of the 5-second-loop there is no overlap. This means the listener remains open outside this code which therefore must be within the ollama library.

@prunes-git
Copy link

I see what's happening I believe it is related to upstream issues whatwg/fetch#1287
However it appears to not be a major issue as garbage collection is dealing with it, I was able to run it at 1 second intervals, and whilst it throws up those warnings after 100 messages after about 200, the warnings stop then it builds up again and repeats.

@markg85
Copy link
Author

markg85 commented May 14, 2024

That to me is a major motivation to ditch nodejs and play with Rust or even C++.

Not because of this very issue or because of using js-tech but because it's mind boggling to me that a fix like this has to go to the spec-bandwagon before the fix finds it's way in the implementations. You're easily talking about a decade (before all implementations are fixed). Browser-land (spec-wise) moves at a snails pace for fixes (it's faster for new stuff). Look at the bug report, it's about to hit 4 years in a couple months and that's just the reporting side of things, nothing has been done spec-wise to fix it yet (that i could find). In Rust or C++ land you'd just use a different networking library or make a fix and recompile yourself.

Oh well, enough drifting. It's up to you what you do with this bug report.

@eugeniosegala
Copy link

eugeniosegala commented May 22, 2024

@markg85 to solve this problem you can change the number of defaultMaxListeners (by default is 10) globally, in your script:

import { EventEmitter } from "events";

// Adjust based on your needs
EventEmitter.defaultMaxListeners = 100;

// rest of the code...

The above will do the trick.

Can also be configured individually but I suspect some changes to ollama-js may be required.

Finally, make sure you monitor the usage of the node process as changing defaultMaxListeners may interfere with other processes on your machine.

https://nodejs.org/api/events.html#eventsdefaultmaxlisteners

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants