Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throughput degradation scheduling daemonset pods #124709

Closed
alculquicondor opened this issue May 6, 2024 · 10 comments
Closed

Throughput degradation scheduling daemonset pods #124709

alculquicondor opened this issue May 6, 2024 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@alculquicondor
Copy link
Member

alculquicondor commented May 6, 2024

What happened?

#119779 added some map queries and creations that add non-negligible latency.

The pprof reveals the following lines inside findNodesThatFitPod as too expensive:

image

What did you expect to happen?

Scheduler to keep a throughput of 300 pods/s

How can we reproduce it (as minimally and precisely as possible)?

Schedule 5k daemonset pods giving 300 qps to the scheduler

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

v1.30

Cloud provider

GKE

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@alculquicondor alculquicondor added the kind/bug Categorizes issue or PR as related to a bug. label May 6, 2024
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 6, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@alculquicondor
Copy link
Member Author

/sig scheduling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 6, 2024
@alculquicondor
Copy link
Member Author

Before #119779, there was this line:
https://github.com/sanposhiho/kubernetes/blob/5ab23179473b621e911511b03693234b60a38033/pkg/scheduler/schedule_one.go#L489

which also had a map access. So we can quite confidently say that the culprit is just the map insert of

diagnosis.NodeToStatusMap[n.Node().Name] = framework.NewStatus(framework.UnschedulableAndUnresolvable, "node is filtered out by the prefilter result")

@alculquicondor
Copy link
Member Author

@sanposhiho, @AxeZhan I have two alternatives:

  1. The obvious one is to do a preallocation for the size of the map. I suspect this might be enough, but still somewhat wasteful.
  2. Change the preemption logic to assume that if a Node is not in the status map, it's UnschedulableAndUnresolvable.
    if m[name].Code() == framework.UnschedulableAndUnresolvable {

Option 2 also solves #124705. However, external PostFilter plugins need to be aware of the change. But I'm not aware of any in the wild that implements a behavior similar to preemption. We can leave an ACTION REQUIRED. But was this path working in 1.28 and older?

Regardless of the decision, I think we should accompany the fix with an integration test for preemption by daemonset pods.

@sanposhiho
Copy link
Member

Can we try (1) first, ask Google for a scalability test again, and then consider (2) if (1) doesn't improve the throughput enough?

(2) could break existing PostFilter plugins. Given PostFilter is not only for preemption, in the world, there may be custom PostFilter plugins which have to rely on the status map with UnschedulableAndUnresolvable nodes. So, it'd be best if we could avoid this breaking.

@AxeZhan
Copy link
Member

AxeZhan commented May 7, 2024

Since (1) is a really small fix, I'm also thinking if we can run tests after #124714 , and then consider (2) base on the test results.

Regardless of the decision, I think we should accompany the fix with an integration test for preemption by daemonset pods.

Are these "integration test" for performance or correctness?

@alculquicondor
Copy link
Member Author

I was asking for a correctness test

@chengjoey
Copy link
Contributor

I wrote a benchmark test called Benchmark_findNodesThatFitPod

The performance without case(1) is:

4927125125 ~ 5118241875 ns/op

The performance of case (1) is:

3865283834 ~ 3972690125 ns/op

(1) may improve throughput

@alculquicondor
Copy link
Member Author

I guess we can have a different tracking issue for an integration benchmark.

This issue itself is fixed.

/close

@k8s-ci-robot
Copy link
Contributor

@alculquicondor: Closing this issue.

In response to this:

I guess we can have a different tracking issue for an integration benchmark.

This issue itself is fixed.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests

5 participants