scheduler_perf: define thresholds per test case and set up alerts for results #124774

sanposhiho · 2024-05-09T14:51:56Z

/kind feature
/sig scheduling

Discussion with sig-scalability: https://kubernetes.slack.com/archives/C09QZTRH7/p1715262959575039

What

We have scheduler-perf, and it'd be great if we could have an alert-ish stuff based on the result.

Based on the discussion with sig-scalability, the easiest way is to change scheduler_perf so that it can fail if the results show degradation, and monitor/alert the failures via testgrid.

"if the results show degradation" > for this, we probably have to define reasonable thresholds per test case.

Why

The current pain point is that perf-dash visualizes it, but no one actually doesn't care much, and consequently we've overlooked degradation several times actually.

k8s-ci-robot · 2024-05-09T14:52:04Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sanposhiho · 2024-05-09T14:52:45Z

@kubernetes/sig-scheduling-misc any feedback for the direction proposed above?

alculquicondor · 2024-05-09T17:54:29Z

+1 from me.

Also, the dashboard doesn't load for me (unless the link is wrong?)

alculquicondor · 2024-05-09T17:54:41Z

Nvm, it loads :)

sanposhiho · 2024-05-10T05:02:14Z

/assign

I just assigned it to me so that it remains on my todo list, but it might take some time for me to come back here because of other prioritized tickets. So, if anyone wants, feel free to take over (I can help reviews either way).

utam0k · 2024-05-12T06:34:23Z

Can I help you?

sanposhiho · 2024-05-12T06:49:11Z

Yes,
/assign @utam0k

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 9, 2024

sanposhiho mentioned this issue May 9, 2024

add benchmark test for ScheduleOne #124728

Open

k8s-ci-robot assigned sanposhiho May 10, 2024

k8s-ci-robot assigned utam0k May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler_perf: define thresholds per test case and set up alerts for results #124774

scheduler_perf: define thresholds per test case and set up alerts for results #124774

sanposhiho commented May 9, 2024 •

edited

k8s-ci-robot commented May 9, 2024

sanposhiho commented May 9, 2024

alculquicondor commented May 9, 2024

alculquicondor commented May 9, 2024

sanposhiho commented May 10, 2024

utam0k commented May 12, 2024 •

edited

sanposhiho commented May 12, 2024

scheduler_perf: define thresholds per test case and set up alerts for results #124774

scheduler_perf: define thresholds per test case and set up alerts for results #124774

Comments

sanposhiho commented May 9, 2024 • edited

What

Why

k8s-ci-robot commented May 9, 2024

sanposhiho commented May 9, 2024

alculquicondor commented May 9, 2024

alculquicondor commented May 9, 2024

sanposhiho commented May 10, 2024

utam0k commented May 12, 2024 • edited

sanposhiho commented May 12, 2024

sanposhiho commented May 9, 2024 •

edited

utam0k commented May 12, 2024 •

edited