Potentially lengthy I/O by kubelet during pod startup needs to be visible to the user #124762

roy-work · 2024-05-08T21:59:39Z

What happened?

For example, we had a pod which had:

 securityContext:
   fsGroup: 1001
   runAsGroup: 1001
   runAsUser: 1001

This — unbeknownst to us for quite some time — led to the pod taking hours to start, with it being otherwise apparently stuck in Pending.

Now, in this particular example, the "root" cause is that the fsGroup there, combined with fsGroupChangePolicy: Always — which is the default, and unspecified by us here — causes the kubelet, prior to starting the pod's containers, to recursively chown the Pod's volumes to the GID in fsGroup. This is a potentially enormous piece of I/O, as in our case, where it took hours. (The volume in question had 2.1M files on it!)

What did you expect to happen?

A clear way to tell why a pod is not transitioning out of Pending in a timely manner. A simple event in the describe pod output for the pod would have done it.

How can we reproduce it (as minimally and precisely as possible)?

Have a volume w/ ~2M files, and an fsGroup.

Anything else we need to know?

The same thing goes for pod termination, too. For example, see this bug, which is full of people who cannot determine why a pod isn't capable of termination. Sadly, that bug was never addressed, and was dismissed with:

there are a lot of different issues being discussed in this thread. There are many possible reasons why a Pod could get stuck in terminating. We know this is a common issue with many possible root causes!

The same pod (large volume + fsGroup) is also failing to terminate cleanly; hence I stumbled into that bug. And similarly, there is no output in describe pod as to why the pod isn't making progress. Users of Kubernetes need to be able to tell what operations are pending against a pod that the kubelet is still working on, that prevent it from transitioning, e.g., to Terminated or to CreatingContainer, etc.

Cf. Azure/AKS#3865

Cf. Azure/AKS#3681

We didn't have someone who knew this particular facet of k8s by heart on hand, and thus spent years dealing with this. Azure's support spent years with this too, similarly making little to no progress. (Until that issues 3865 above finally cracked the case.) That is what makes a useful diagnostic all the more important: to clue the user in on this default behavior that is probably the right thing for most use cases, but not something the user ever asked for, and thus, not something they're going to guess.

Kubernetes version

$ kubectl version
# paste output here

Cloud provider

Azure (AKS), GCP (GKE)

OS version

GKE:

(We've since left Azure/AKS for this, mostly. But given that we have replicated the behavior on GKE, I'm pretty sure the linked bug is correct in that this is base k8s behavior, not specific to a vendor's implementation.)

# cat /etc/lsb-release
CHROMEOS_AUSERVER=https://tools.google.com/service/update2
CHROMEOS_BOARD_APPID={76E245CF-C0D0-444D-BA50-36739C18EB00}
CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
CHROMEOS_DEVSERVER=
CHROMEOS_RELEASE_APPID={76E245CF-C0D0-444D-BA50-36739C18EB00}
CHROMEOS_RELEASE_BOARD=lakitu
CHROMEOS_RELEASE_BRANCH_NUMBER=66
CHROMEOS_RELEASE_BUILD_NUMBER=17800
CHROMEOS_RELEASE_BUILD_TYPE=Official Build
CHROMEOS_RELEASE_CHROME_MILESTONE=109
CHROMEOS_RELEASE_DESCRIPTION=17800.66.78 (Official Build) stable-channel lakitu
CHROMEOS_RELEASE_KEYSET=v10
CHROMEOS_RELEASE_NAME=Chrome OS
CHROMEOS_RELEASE_PATCH_NUMBER=78
CHROMEOS_RELEASE_TRACK=stable-channel
CHROMEOS_RELEASE_VERSION=17800.66.78
DEVICETYPE=OTHER
GOOGLE_RELEASE=17800.66.78
HWID_OVERRIDE=LAKITU DEFAULT

Install tools

Container runtime (CRI) and version (if applicable)

containerd://1.7.10

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-05-08T21:59:48Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

neolit123 · 2024-05-16T05:39:25Z

/sig node

haircommander · 2024-05-22T19:22:04Z

CRI-O users hit this issue and CRI-O emits a metric when pod or container creation is stuck on a specific step for a while. There's currently no better way for the CRI implementation to tell the kubelet what stage a creation request is at now. Having an event would definitely be a better UX

I would say this is (unfortunately) working as intended, as the CRI was not initially designed to have more insight in to the process. That certainly can be redesigned, but it would need a champion to take on the KEP process (and reviewer/approver bandwidth to merge it)

/kind feature
/remove-kind bug

roy-work added the kind/bug Categorizes issue or PR as related to a bug. label May 8, 2024

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 8, 2024

roy-work changed the title ~~Potentially length I/O by kubelet during pod startup needs to be visible to the user~~ Potentially lengthy I/O by kubelet during pod startup needs to be visible to the user May 8, 2024

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 16, 2024

SergeyKanzhelev added this to Triage in SIG Node Bugs May 22, 2024

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 22, 2024

haircommander moved this from Triage to Triaged in SIG Node Bugs May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potentially lengthy I/O by kubelet during pod startup needs to be visible to the user #124762

Potentially lengthy I/O by kubelet during pod startup needs to be visible to the user #124762

roy-work commented May 8, 2024 •

edited

k8s-ci-robot commented May 8, 2024

neolit123 commented May 16, 2024

haircommander commented May 22, 2024

Potentially lengthy I/O by kubelet during pod startup needs to be visible to the user #124762

Potentially lengthy I/O by kubelet during pod startup needs to be visible to the user #124762

Comments

roy-work commented May 8, 2024 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented May 8, 2024

neolit123 commented May 16, 2024

haircommander commented May 22, 2024

roy-work commented May 8, 2024 •

edited