Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unavailable Splunk endpoint that returns 504, breaks all other sinks and whole Vector deployment #20496

Closed
Alan18081 opened this issue May 14, 2024 · 1 comment
Labels
type: bug A code related bug.

Comments

@Alan18081
Copy link

A note for the community

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

We encountered problem with Splunk HeC sink. When Splunk endpoint returns 504 Gateway Timeout, we found out that other sinks went in broken state to other Datadog destinations. Eventually, it causes whole Vector Pod to fail and we need to restart it

Configuration

sink_dd_external_splunk_${customer_account_name}_logs_${splunk}:
  type: splunk_hec_logs
  endpoint: ${splunk-endpoint}
  default_token: ${splunk-token}
  compression: gzip
  encoding:
    codec: json
  inputs:
  - tenants_external_logs.${customer_account_name}
  request:
    retry_attempts: 20
  buffer:
    type: disk
    max_size: 500000000
    when_full: block

Version

0.32.1

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@Alan18081 Alan18081 added the type: bug A code related bug. label May 14, 2024
@jszwedko
Copy link
Member

Hi @Alan18081 !

It is expected behavior that that a failing sink will apply back-pressure to other sinks connected to the same inputs. The way to obviate this is to configure a buffer on the sink you want to tolerate downtime for, which I see you did. If the buffer fills up, back-pressure will still be applied if you have when_full: block though so you may want to consider increasing the size of your buffer to tolerate a longer period of downtime. Alternatively, you can configure the buffer to drop data rather than block with when_full: drop. The concept of backpressure in Vector is described in more detail here: https://vector.dev/docs/about/concepts/#backpressure

I'll close this out since it appears to have worked as-designed, but let me know if you have any additional questions about this!

@jszwedko jszwedko closed this as not planned Won't fix, can't repro, duplicate, stale May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants