Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage in the ingress gateway #51150

Open
2 tasks done
AleksanderBrzozowski opened this issue May 20, 2024 · 0 comments
Open
2 tasks done

High memory usage in the ingress gateway #51150

AleksanderBrzozowski opened this issue May 20, 2024 · 0 comments

Comments

@AleksanderBrzozowski
Copy link

AleksanderBrzozowski commented May 20, 2024

Is this the right place to submit this?

  • This is not a security vulnerability or a crashing bug
  • This is not a question about how to use Istio

Bug Description

We use Istio for both service mesh and as an ingress gateway.

We have a load test, using gatling, in open model. The load test sends a requests towards one of the services trough an ingress gateway:

load test --> ingress gateway --> upstream service

During the test, the upstream service fails, and ingress gateway starts to return 503, which is a correct behavior. What isn't correct, though, is the fact that ingress gateway memory starts to increase when the upstream service is down.

I would like to know what is the root cause of this issue, and why the memory starts to increase. This doesn't seem to be a memory leak, because when the test finishes, the memory is lowered significantly.

Below you can find graphs:

  • the first one is number of requests that ended up with either 200 or 503 code, this is based on istio_requests_total
  • the second one is memory reported by pod, based on container_memory_working_set_bytes
  • the third one is a p95 latency, measured by the ingress gateway pod, using istio_request_duration_milliseconds_bucket metric. The latency is measured by response code.

image
image
image

And lastly, a heap dump from the gateway, took during the time when memory usage was very high:

File: envoy
Type: inuse_space
Showing nodes accounting for 1432273.49kB, 97.00% of 1476618.51kB total
Dropped 660 nodes (cum <= 7383.09kB)
      flat  flat%   sum%        cum   cum%
 1157888kB 78.41% 78.41% 1168576.19kB 79.14%  Envoy::Network::ListenerFilterBufferImpl::ListenerFilterBufferImpl
148193.22kB 10.04% 88.45% 148193.22kB 10.04%  OPENSSL_malloc
43156.56kB  2.92% 91.37% 101338.88kB  6.86%  std::__1::make_unique
21332.78kB  1.44% 92.82% 169701.08kB 11.49%  <unknown>
14603.37kB  0.99% 93.81% 18612.02kB  1.26%  Envoy::Server::ActiveTcpSocket::ActiveTcpSocket
10901.97kB  0.74% 94.55% 10901.97kB  0.74%  OPENSSL_realloc
 7567.22kB  0.51% 95.06% 20746.36kB  1.40%  Envoy::Event::DispatcherImpl::createFileEvent
 6796.94kB  0.46% 95.52% 1454476.76kB 98.50%  std::__1::__function::__func::operator()
 6175.88kB  0.42% 95.94% 58708.02kB  3.98%  Envoy::Event::DispatcherImpl::createServerConnection
 5947.56kB   0.4% 96.34%  8650.22kB  0.59%  Envoy::Event::LibeventScheduler::createSchedulableCallback
 2803.44kB  0.19% 96.53% 1171379.62kB 79.33%  Envoy::Server::ActiveTcpSocket::createListenerFilterBuffer
 2407.06kB  0.16% 96.69% 1279723.11kB 86.67%  Envoy::Network::TcpListenerImpl::onSocketEvent
    2261kB  0.15% 96.85%  1267517kB 85.84%  Envoy::Server::ActiveTcpListener::onAcceptWorker
 2194.38kB  0.15% 96.99% 12588.70kB  0.85%  Envoy::Network::FilterManagerImpl::addReadFilter
   44.03kB 0.003% 97.00%  9990.62kB  0.68%  Envoy::Http::ConnectionManagerImpl::initializeReadFilterCallbacks
    0.08kB 5.3e-06% 97.00% 22016.08kB  1.49%  std::__1::__tree::__emplace_unique_key_args
         0     0% 97.00% 10901.66kB  0.74%  BUF_MEM_append
         0     0% 97.00%  9660.41kB  0.65%  EVP_parse_public_key
         0     0% 97.00% 41347.50kB  2.80%  Envoy::Buffer::WatermarkBufferFactory::createBuffer
         0     0% 97.00%  8650.22kB  0.59%  Envoy::Event::DispatcherImpl::createSchedulableCallback
         0     0% 97.00% 13179.14kB  0.89%  Envoy::Event::FileEventImpl::FileEventImpl
         0     0% 97.00% 1454410.04kB 98.50%  Envoy::Event::FileEventImpl::assignEvents()::$_1::__invoke
         0     0% 97.00% 73219.06kB  4.96%  Envoy::Extensions::ListenerFilters::TlsInspector::Filter::Filter
         0     0% 97.00% 51274.20kB  3.47%  Envoy::Extensions::TransportSockets::Tls::ContextImpl::newSsl
         0     0% 97.00% 59987.92kB  4.06%  Envoy::Extensions::TransportSockets::Tls::ServerSslSocketFactory::createDownstreamTransportSocket
         0     0% 97.00% 33432.48kB  2.26%  Envoy::Extensions::TransportSockets::Tls::SslHandshakerImpl::doHandshake
         0     0% 97.00% 58182.08kB  3.94%  Envoy::Extensions::TransportSockets::Tls::SslSocket::SslSocket
         0     0% 97.00% 33432.48kB  2.26%  Envoy::Extensions::TransportSockets::Tls::SslSocket::doWrite
         0     0% 97.00% 52544.65kB  3.56%  Envoy::Network::ConnectionImpl::ConnectionImpl
         0     0% 97.00% 36072.94kB  2.44%  Envoy::Network::ConnectionImpl::onFileEvent
         0     0% 97.00% 33480.48kB  2.27%  Envoy::Network::ConnectionImpl::onWriteReady
         0     0% 97.00% 20745.81kB  1.40%  Envoy::Network::IoSocketHandleImpl::initializeFileEvent
         0     0% 97.00% 138676.42kB  9.39%  Envoy::Network::ListenerFilterBufferImpl::onFileEvent
         0     0% 97.00% 52532.15kB  3.56%  Envoy::Network::ServerConnectionImpl::ServerConnectionImpl
         0     0% 97.00% 138955.81kB  9.41%  Envoy::Server::ActiveStreamListenerBase::newConnection
         0     0% 97.00% 1246645.39kB 84.43%  Envoy::Server::ActiveStreamListenerBase::onSocketAccepted
         0     0% 97.00% 1310334.91kB 88.74%  Envoy::Server::ActiveTcpSocket::continueFilterChain
         0     0% 97.00% 138955.29kB  9.41%  Envoy::Server::ActiveTcpSocket::newConnection
         0     0% 97.00% 93104.89kB  6.31%  Envoy::Server::Configuration::FilterChainUtility::buildFilterChain
         0     0% 97.00% 74211.46kB  5.03%  Envoy::Server::ListenerImpl::createListenerFilterChain
         0     0% 97.00% 1452654.44kB 98.38%  Envoy::Server::WorkerImpl::threadRoutine
         0     0% 97.00% 1452647.28kB 98.38%  Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda(void*)#1}::__invoke
         0     0% 97.00% 1118756.32kB 75.76%  MallocHook::InvokeNewHookSlow
         0     0% 97.00% 1108645.34kB 75.08%  NewHook
         0     0% 97.00%  8466.34kB  0.57%  ProfilerFree
         0     0% 97.00%  9660.16kB  0.65%  RSA_parse_public_key
         0     0% 97.00% 34206.51kB  2.32%  SSL_do_handshake
         0     0% 97.00% 123069.77kB  8.33%  SSL_new
         0     0% 97.00% 1453960.66kB 98.47%  [libc.so.6]
         0     0% 97.00% 22616.88kB  1.53%  bssl::SSLAEADContext::CreateNullCipher
         0     0% 97.00%  7632.47kB  0.52%  bssl::ssl_cert_dup
         0     0% 97.00%  7483.81kB  0.51%  bssl::ssl_get_new_session
         0     0% 97.00% 55694.03kB  3.77%  bssl::ssl_handshake_new
         0     0% 97.00%  9659.91kB  0.65%  bssl::ssl_on_certificate_selected
         0     0% 97.00% 11416.88kB  0.77%  bssl::ssl_open_handshake
         0     0% 97.00% 34206.51kB  2.32%  bssl::ssl_run_handshake
         0     0% 97.00% 22787.55kB  1.54%  bssl::ssl_server_handshake
         0     0% 97.00% 11095.49kB  0.75%  bssl::tls13_server_handshake
         0     0% 97.00% 98860.59kB  6.70%  bssl::tls_new
         0     0% 97.00% 11416.88kB  0.77%  bssl::tls_open_handshake
         0     0% 97.00% 1454402.94kB 98.50%  event_base_loop
         0     0% 97.00% 1454419.80kB 98.50%  event_process_active_single_queue
         0     0% 97.00%  9660.16kB  0.65%  rsa_pub_decode
         0     0% 97.00%  7434.80kB   0.5%  tcmalloc::allocate_full_cpp_throw_oom
         0     0% 97.00% 12572.04kB  0.85%  virtual thunk to Envoy::Network::ConnectionImpl::addReadFilter(std::__1::shared_ptr)

There are two questions that I would like to answer:

  • first one is why we see increased memory? Is it due to a growing number of connections?
  • why latency is so high for calls that end up with 503

Version

We are running Istio `1.19.7`, installed in a Kubernetes cluster managed by AWS (EKS). The Kubernetes version is `1.27`.

Additional Information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant