Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially lengthy I/O by kubelet during pod startup needs to be visible to the user #124762

Open
roy-work opened this issue May 8, 2024 · 3 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@roy-work
Copy link

roy-work commented May 8, 2024

What happened?

For example, we had a pod which had:

 securityContext:
   fsGroup: 1001
   runAsGroup: 1001
   runAsUser: 1001

This — unbeknownst to us for quite some time — led to the pod taking hours to start, with it being otherwise apparently stuck in Pending.

Now, in this particular example, the "root" cause is that the fsGroup there, combined with fsGroupChangePolicy: Alwayswhich is the default, and unspecified by us here — causes the kubelet, prior to starting the pod's containers, to recursively chown the Pod's volumes to the GID in fsGroup. This is a potentially enormous piece of I/O, as in our case, where it took hours. (The volume in question had 2.1M files on it!)

What did you expect to happen?

A clear way to tell why a pod is not transitioning out of Pending in a timely manner. A simple event in the describe pod output for the pod would have done it.

How can we reproduce it (as minimally and precisely as possible)?

Have a volume w/ ~2M files, and an fsGroup.

Anything else we need to know?

The same thing goes for pod termination, too. For example, see this bug, which is full of people who cannot determine why a pod isn't capable of termination. Sadly, that bug was never addressed, and was dismissed with:

there are a lot of different issues being discussed in this thread. There are many possible reasons why a Pod could get stuck in terminating. We know this is a common issue with many possible root causes!

The same pod (large volume + fsGroup) is also failing to terminate cleanly; hence I stumbled into that bug. And similarly, there is no output in describe pod as to why the pod isn't making progress. Users of Kubernetes need to be able to tell what operations are pending against a pod that the kubelet is still working on, that prevent it from transitioning, e.g., to Terminated or to CreatingContainer, etc.

Cf. Azure/AKS#3865

Cf. Azure/AKS#3681

We didn't have someone who knew this particular facet of k8s by heart on hand, and thus spent years dealing with this. Azure's support spent years with this too, similarly making little to no progress. (Until that issues 3865 above finally cracked the case.) That is what makes a useful diagnostic all the more important: to clue the user in on this default behavior that is probably the right thing for most use cases, but not something the user ever asked for, and thus, not something they're going to guess.

Kubernetes version

$ kubectl version
# paste output here

Cloud provider

Azure (AKS), GCP (GKE)

OS version

GKE:

(We've since left Azure/AKS for this, mostly. But given that we have replicated the behavior on GKE, I'm pretty sure the linked bug is correct in that this is base k8s behavior, not specific to a vendor's implementation.)

# cat /etc/lsb-release
CHROMEOS_AUSERVER=https://tools.google.com/service/update2
CHROMEOS_BOARD_APPID={76E245CF-C0D0-444D-BA50-36739C18EB00}
CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
CHROMEOS_DEVSERVER=
CHROMEOS_RELEASE_APPID={76E245CF-C0D0-444D-BA50-36739C18EB00}
CHROMEOS_RELEASE_BOARD=lakitu
CHROMEOS_RELEASE_BRANCH_NUMBER=66
CHROMEOS_RELEASE_BUILD_NUMBER=17800
CHROMEOS_RELEASE_BUILD_TYPE=Official Build
CHROMEOS_RELEASE_CHROME_MILESTONE=109
CHROMEOS_RELEASE_DESCRIPTION=17800.66.78 (Official Build) stable-channel lakitu
CHROMEOS_RELEASE_KEYSET=v10
CHROMEOS_RELEASE_NAME=Chrome OS
CHROMEOS_RELEASE_PATCH_NUMBER=78
CHROMEOS_RELEASE_TRACK=stable-channel
CHROMEOS_RELEASE_VERSION=17800.66.78
DEVICETYPE=OTHER
GOOGLE_RELEASE=17800.66.78
HWID_OVERRIDE=LAKITU DEFAULT

Install tools

Container runtime (CRI) and version (if applicable)

containerd://1.7.10

Related plugins (CNI, CSI, ...) and versions (if applicable)

@roy-work roy-work added the kind/bug Categorizes issue or PR as related to a bug. label May 8, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 8, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@roy-work roy-work changed the title Potentially length I/O by kubelet during pod startup needs to be visible to the user Potentially lengthy I/O by kubelet during pod startup needs to be visible to the user May 8, 2024
@neolit123
Copy link
Member

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 16, 2024
@SergeyKanzhelev SergeyKanzhelev added this to Triage in SIG Node Bugs May 22, 2024
@haircommander
Copy link
Contributor

CRI-O users hit this issue and CRI-O emits a metric when pod or container creation is stuck on a specific step for a while. There's currently no better way for the CRI implementation to tell the kubelet what stage a creation request is at now. Having an event would definitely be a better UX

I would say this is (unfortunately) working as intended, as the CRI was not initially designed to have more insight in to the process. That certainly can be redesigned, but it would need a champion to take on the KEP process (and reviewer/approver bandwidth to merge it)

/kind feature
/remove-kind bug

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 22, 2024
@haircommander haircommander moved this from Triage to Triaged in SIG Node Bugs May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
Development

No branches or pull requests

4 participants