Potentially lengthy I/O by kubelet during pod startup needs to be visible to the user #124762
Labels
kind/feature
Categorizes issue or PR as related to a new feature.
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
sig/node
Categorizes an issue or PR as relevant to SIG Node.
Projects
What happened?
For example, we had a pod which had:
This — unbeknownst to us for quite some time — led to the pod taking hours to start, with it being otherwise apparently stuck in
Pending
.Now, in this particular example, the "root" cause is that the
fsGroup
there, combined withfsGroupChangePolicy: Always
— which is the default, and unspecified by us here — causes thekubelet
, prior to starting the pod's containers, to recursivelychown
the Pod's volumes to the GID infsGroup
. This is a potentially enormous piece of I/O, as in our case, where it took hours. (The volume in question had 2.1M files on it!)What did you expect to happen?
A clear way to tell why a pod is not transitioning out of
Pending
in a timely manner. A simple event in thedescribe pod
output for the pod would have done it.How can we reproduce it (as minimally and precisely as possible)?
Have a volume w/ ~2M files, and an
fsGroup
.Anything else we need to know?
The same thing goes for pod termination, too. For example, see this bug, which is full of people who cannot determine why a pod isn't capable of termination. Sadly, that bug was never addressed, and was dismissed with:
The same pod (large volume +
fsGroup
) is also failing to terminate cleanly; hence I stumbled into that bug. And similarly, there is no output indescribe pod
as to why the pod isn't making progress. Users of Kubernetes need to be able to tell what operations are pending against a pod that the kubelet is still working on, that prevent it from transitioning, e.g., toTerminated
or toCreatingContainer
, etc.Cf. Azure/AKS#3865
Cf. Azure/AKS#3681
We didn't have someone who knew this particular facet of k8s by heart on hand, and thus spent years dealing with this. Azure's support spent years with this too, similarly making little to no progress. (Until that issues 3865 above finally cracked the case.) That is what makes a useful diagnostic all the more important: to clue the user in on this default behavior that is probably the right thing for most use cases, but not something the user ever asked for, and thus, not something they're going to guess.
Kubernetes version
Cloud provider
Azure (AKS), GCP (GKE)
OS version
GKE:
(We've since left Azure/AKS for this, mostly. But given that we have replicated the behavior on GKE, I'm pretty sure the linked bug is correct in that this is base k8s behavior, not specific to a vendor's implementation.)
Install tools
Container runtime (CRI) and version (if applicable)
containerd://1.7.10
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: