Fix memory leak while stopping an unknown partition #4669

emasab · 2024-04-03T06:48:18Z

When stopping a partition that isn't known to the client a reference to that partition was kept in the fetchq of the same, leading to a memory leak.
Solved by purging the fetch and ops queue of the unknown partition when it's stopped

When stopping a partition that isn't known to the client a reference to that partition was kept in the fetchq of the same partition, leading to a memory leak. Solved by purging the fetch and ops queue of the unknown partition when it's stopped

Quuxplusone · 2024-04-04T17:34:38Z

In #4486 I wrote:

My employer has also tentatively bisected an issue to 8e20e1e. We see that if we have a groupconsumer, and the broker goes down (that is, we shut down the Kafka broker), and then comes back up, and then we try to shut down the groupconsumer, it simply hangs forever. This doesn't reproduce in 1.9.3rc2 or earlier, but it does reproduce in 2.0.0rc1 through 2.3.0. In fact, it appears not to reproduce in a83cadf but to start reproducing with 8e20e1e.

@emasab wrote:

@Quuxplusone in case it causes a hang, please try this fix and in case it's still happening tell me how to reproduce it.

Okay, I just tried it. My employer's test case is red with librdkafka v2.3.0, green with librdkafka v2.3.0-plus-#4667, but unfortunately remains red with librdkafka v2.3.0-plus-#4669.

That's not to say that #4669 is useless or fails to fix a real bug; it just doesn't fix our bug.

OTOH I also don't claim that #4669 is bug-free. Its new code that takes two mutex locks at once is clearly at risk of deadlock. It would benefit from a code comment explaining (or linking to) the project's lock-ordering policy.

Unfortunately I don't have a reproducer using only librdkafka code; we build several C++ layers on top of librdkafka.

emasab · 2024-04-05T09:03:35Z

@Quuxplusone thanks for testing it. The code respects the lock ordering policy, that is:

rk main lock
rkb lock
rkt lock
rktp lock

Is your failing test doing something specific, like deleting a topic or using the cooperative-sticky assignor?

emasab mentioned this pull request Apr 3, 2024

C++ consumer APIs have memory leaks under certain conditions #4486

Open

Quuxplusone mentioned this pull request Apr 5, 2024

Starting with 8e20e1ee, after broker goes down and back up, rd_kafka_destroy of groupconsumer hangs #4674

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory leak while stopping an unknown partition #4669

Fix memory leak while stopping an unknown partition #4669

emasab commented Apr 3, 2024

Quuxplusone commented Apr 4, 2024 •

edited

emasab commented Apr 5, 2024

Fix memory leak while stopping an unknown partition #4669

Are you sure you want to change the base?

Fix memory leak while stopping an unknown partition #4669

Conversation

emasab commented Apr 3, 2024

Quuxplusone commented Apr 4, 2024 • edited

emasab commented Apr 5, 2024

Quuxplusone commented Apr 4, 2024 •

edited