Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the metrics cardinality #160

Open
ShoshinNikita opened this issue Jun 14, 2023 · 3 comments
Open

Reduce the metrics cardinality #160

ShoshinNikita opened this issue Jun 14, 2023 · 3 comments

Comments

@ShoshinNikita
Copy link
Contributor

Right now the metric http_response_time_seconds has one label - path, and r.URL.Path is used as its value. This can be an issue when a service behind Reproxy uses dynamic urls (for example, /api/v1/users/{user_id}). In this case, the response size of /metrics may become too large to process by Prometheus (especially when resources are limited).

We can't just drop this label because of the backward compatibility promise. However, we can make it optional. It would be enabled by default - again, to comply with the backward compatibility promise.


I also have a few more suggestions (more → less important):

  1. Add the label server to all metrics. It feels wrong that only http_requests_total has this label. For example, it prevents me from creating an alert for 500+ statuses for specific upstreams.
  2. Add metrics http_request_size_bytes{server} and http_response_size_bytes{server} that would allow users to monitor the size of requests and responses.
  3. Add new buckets for http_response_time_seconds - for example, 10 and 30. The current "highest" bucket is 5s, but it can take much longer to process some types of requests. At the same time, I understand that it's hard to find a good and fitting maximum value. Another option it to make the buckets configurable - but I am not sure it would be practical.
@mikelorant
Copy link

This is a major issue when you identify your /metrics endpoint has a payload over 34MB.

@umputun This makes the metrics endpoint unusable for large production systems.

@umputun
Copy link
Owner

umputun commented May 25, 2024

Can you provide a PR for this change?

@mikelorant
Copy link

I have been doing this type of work already for an issue with fastly/fastly-exporter#152 which has resulted in the solution fastly/fastly-exporter#153.

@umputun Let me know which direction you'd like to go in based on the feedback provided by @ShoshinNikita.

My question is mostly around do we drop the field or keep the field with the value of aggregated? The reasons for keeping the field is it wouldn't break any dashboards that rely on this field.

Can you also provide your preferred name for the CLI argument to toggle this feature?

I would only plan to add this specific feature and not the other recommendations from @ShoshinNikita.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants