You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using remote write to a receiver, which can be temporary down when updating.
What did you expect to see?
All samples should be evantualy written back when a receiver is running again.
What did you see instead? Under which circumstances?
After upgrading to 2.51.2 from 2.50.1 I started missing samples on a receiver in time when it was not running. I cheked this by quering some metrics samples in both prometheus and receiver. I can see drop in prometheus_remote_storage_samples_total but nothing in prometheus_remote_storage_samples_dropped_total or prometheus_remote_storage_samples_failed_total metrics. So the samples were probraly never tried to send and just skipped.
I'm suspecting changes made in #13583 and shared parameter tail bool between
First time is used to to tail the wal and second time to skip reading samples of checkpoint. Prior the change it stayed true once set. However now it can revert back to false when processing of samples is paused for some time and then resumed.
Reverting back to 2.50.1 fixed this.
System information
No response
Prometheus version
2.51.2
Prometheus configuration file
No response
Alertmanager version
No response
Alertmanager configuration file
No response
Logs
No response
The text was updated successfully, but these errors were encountered:
What did you do?
I'm using remote write to a receiver, which can be temporary down when updating.
What did you expect to see?
All samples should be evantualy written back when a receiver is running again.
What did you see instead? Under which circumstances?
After upgrading to 2.51.2 from 2.50.1 I started missing samples on a receiver in time when it was not running. I cheked this by quering some metrics samples in both prometheus and receiver. I can see drop in
prometheus_remote_storage_samples_total
but nothing inprometheus_remote_storage_samples_dropped_total
orprometheus_remote_storage_samples_failed_total
metrics. So the samples were probraly never tried to send and just skipped.I'm suspecting changes made in #13583 and shared parameter
tail bool
betweenprometheus/tsdb/wlog/watcher.go
Line 393 in 3b8b577
prometheus/tsdb/wlog/watcher.go
Line 533 in 3b8b577
true
once set. However now it can revert back tofalse
when processing of samples is paused for some time and then resumed.Reverting back to 2.50.1 fixed this.
System information
No response
Prometheus version
Prometheus configuration file
No response
Alertmanager version
No response
Alertmanager configuration file
No response
Logs
No response
The text was updated successfully, but these errors were encountered: