memory leak? #83

markg85 · 2024-05-08T15:38:46Z

Hi,

I was running a chat in a loop every 5 seconds.

setInterval(async () =>{ 
    const request = `... some long text ...`;

    const response = await ollama.chat({
        model: 'phi3:3.8b-mini-instruct-4k-q4_K_M',
        messages: [{ role: 'user', content: request }],
        options: {
            temperature: 0
        },
      })
    
      console.log(response.message.content)

}, 5000);

After a couple minutes i get:

(node:38490) MaxListenersExceededWarning: Possible EventTarget memory leak detected. 101 abort listeners added to [AbortSignal]. Use events.setMaxListeners() to increase limit

Note that 101 in there, it increases every 5 seconds too so i'm fairly sure this is caused by ollama-js.

You might be wondering, why run this in a loop?
Simple really, i have some data - that changes rapidly - that i want to ask a model about.
Am i doing something wrong here? Should i be using a different function?

The text was updated successfully, but these errors were encountered:

prunes-git · 2024-05-13T08:49:10Z

Does the model reliably respond in under 5 seconds? based on your error message it looks like it may be taking longer and thus slowly accumulating listeners until it reaches it's max and throws those errors.

markg85 · 2024-05-13T11:18:43Z

It does reliably respond, yes, but that might take a little more then a couple seconds.

While i fixed it locally by only asking the models something once every minute (didn't need finer grained), it's still something that should probably be looked at.

prunes-git · 2024-05-13T11:52:00Z

If you are making requests every 5 seconds, and it takes 6 seconds to respond then you lose a second each time, so after 5 requests there is two open listeners at all times, that will slowly build up over time, and if it takes 10 seconds to respond then it would have a additional open listener every 5 seconds, that is not anything that can be fixed anywhere but in your code, that is why it gives you the error message.

markg85 · 2024-05-13T12:02:58Z

I'm not so sure about that.
I just checked, again with the phi3 model.

The model responds within 1 second and i'm repeating the same question over and over again every 5 seconds.
On a cold start the model takes 2 seconds but i'm assuming that's just because of loading it into memory.

The code i'm using is exactly as in my initial post. Why don't you try it out and see for yourself.

The issue literally can't be in my code as i'm not creating any listeners. I'm only calling await ollama.chat which completes within 5 seconds. Thus on my side in terms of the 5-second-loop there is no overlap. This means the listener remains open outside this code which therefore must be within the ollama library.

prunes-git · 2024-05-14T01:35:38Z

I see what's happening I believe it is related to upstream issues whatwg/fetch#1287
However it appears to not be a major issue as garbage collection is dealing with it, I was able to run it at 1 second intervals, and whilst it throws up those warnings after 100 messages after about 200, the warnings stop then it builds up again and repeats.

markg85 · 2024-05-14T01:55:24Z

That to me is a major motivation to ditch nodejs and play with Rust or even C++.

Not because of this very issue or because of using js-tech but because it's mind boggling to me that a fix like this has to go to the spec-bandwagon before the fix finds it's way in the implementations. You're easily talking about a decade (before all implementations are fixed). Browser-land (spec-wise) moves at a snails pace for fixes (it's faster for new stuff). Look at the bug report, it's about to hit 4 years in a couple months and that's just the reporting side of things, nothing has been done spec-wise to fix it yet (that i could find). In Rust or C++ land you'd just use a different networking library or make a fix and recompile yourself.

Oh well, enough drifting. It's up to you what you do with this bug report.

eugeniosegala · 2024-05-22T12:15:10Z

@markg85 to solve this problem you can change the number of defaultMaxListeners (by default is 10) globally, in your script:

import { EventEmitter } from "events";

// Adjust based on your needs
EventEmitter.defaultMaxListeners = 100;

// rest of the code...

The above will do the trick.

Can also be configured individually but I suspect some changes to ollama-js may be required.

Finally, make sure you monitor the usage of the node process as changing defaultMaxListeners may interfere with other processes on your machine.

https://nodejs.org/api/events.html#eventsdefaultmaxlisteners

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory leak? #83

memory leak? #83

markg85 commented May 8, 2024

prunes-git commented May 13, 2024

markg85 commented May 13, 2024

prunes-git commented May 13, 2024 •

edited

markg85 commented May 13, 2024 •

edited

prunes-git commented May 14, 2024

markg85 commented May 14, 2024

eugeniosegala commented May 22, 2024 •

edited

memory leak? #83

memory leak? #83

Comments

markg85 commented May 8, 2024

prunes-git commented May 13, 2024

markg85 commented May 13, 2024

prunes-git commented May 13, 2024 • edited

markg85 commented May 13, 2024 • edited

prunes-git commented May 14, 2024

markg85 commented May 14, 2024

eugeniosegala commented May 22, 2024 • edited

prunes-git commented May 13, 2024 •

edited

markg85 commented May 13, 2024 •

edited

eugeniosegala commented May 22, 2024 •

edited