-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod phase transition is slower when EventedPLEG is enabled #124704
Comments
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/sig node |
/assign |
/cc @pacoxu |
Sure. I created anther PR(#125018) for testing both PRs. |
What happened?
This issue presupposes that PR #124297 is merged. I believe this fix is necessary for the Evented PLEG.
As I tried running a pod on my laptop with a locally built kubernetes with applying #124297, a pod phase transition is slower especially at deletion when the EventedPLEG is enabled.
EventedPLEG is enabled:
EventedPLEG is disabled:
As described here, a pod worker is blocked at
cache.GetNewerThan()
when it is woken up by an update without a cache update in a PLEG:kubernetes/pkg/kubelet/pod_workers.go
Lines 1244 to 1253 in ade0d21
The worker is unblocked when another event is delivered or the PLEG calls
cache.UpdateTime()
. At the latter case, while the genericPLEG callscache.UpdateTime()
every one second along withRelist()
, the EventPLEG callscache.UpdateTime()
every five seconds. Because of this difference, the Evented PLEG spends more time to get pods into another phase.kubernetes/pkg/kubelet/pleg/evented.go
Lines 34 to 37 in ade0d21
Even if there is a cache update, a worker can be blocked when the cache is updated by an asynchronous event before the worker finishes
SyncPod()
. For instance, when a runtime starts a container, the PLEG gets an event and caches the container status(running
). If this event is received after a pod worker finishesSyncPod()
to start the container, the worker gets the new status atGetNewerThan()
soon and runsSyncPod()
again to update the pod phase torunning
. However, if the event arrives before the worker finishesSyncPod()
to start the container, the worker is blocked atGetNewerThan()
because the cached status is older thanlastSyncTime
.What did you expect to happen?
The pod phase transition should be as fast as GenericPLEG. It would be better to set
globalCacheUpdatePeriod
to 1 second.How can we reproduce it (as minimally and precisely as possible)?
Build kubernetes locally with applying PR #124297 and run with enabling
EventedPLEG
feature gate.Use this simple-pod.yaml
Run the command:
This is the result I tried the command ten times:
This is the result when
EventedPLEG
is disabled:Anything else we need to know?
No response
Kubernetes version
master + PR #124297
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: