kubelet should alway get lastest secret/configmap resource in pod add event #124701

V0idk · 2024-05-06T09:44:21Z

What happened?

The logic is as follows:

Manually update secret resources.
kubelet listens to resources and updates the watch cache.
Start the pod and query the volume attached to the secret resource.

but probabilistically, especially at lots of operating pressures. The third step is performed before the second step. As a result, the old secret is mounted when the pod is started.

https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/

What did you expect to happen?

solution: kubelet should alway get lastest secret/configmap resource in pod add event instead of using cache

How can we reproduce it (as minimally and precisely as possible)?

see What happend

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

V0idk · 2024-05-06T09:49:14Z

/sig area/kubelet

k8s-ci-robot · 2024-05-06T09:49:16Z

@V0idk: The label(s) sig/area/kubelet cannot be applied, because the repository doesn't have them.

In response to this:

/sig area/kubelet

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

vaibhav2107 · 2024-05-06T12:49:01Z

/sig node

haircommander · 2024-05-22T17:33:08Z

/assign @harche

haircommander · 2024-05-22T17:34:05Z

/priority backlog

liggitt · 2024-05-22T17:52:14Z

hrmm... when a new pod is registered, the cache manager implementation invalidates the secret / configmap. I expected the watch one to do the same, but it apparently does not.

@wojtek-t, this means that updating a secret and deployment does not guarantee the newly started pods use the new secret content. That seems... bad.

harche · 2024-05-23T16:54:55Z

This is where kubelet sets the strategy for fetching the secrets -

kubernetes/pkg/kubelet/kubelet.go

Line 586 in 5a121aa

switch kubeCfg.ConfigMapAndSecretChangeDetectionStrategy {

harche · 2024-05-23T18:26:33Z

	// GetChangeDetectionStrategy is a mode in which kubelet fetches
	// necessary objects directly from apiserver.
	GetChangeDetectionStrategy ResourceChangeDetectionStrategy = "Get"
	// TTLCacheChangeDetectionStrategy is a mode in which kubelet uses
	// ttl cache for object directly fetched from apiserver.
	TTLCacheChangeDetectionStrategy ResourceChangeDetectionStrategy = "Cache"
	// WatchChangeDetectionStrategy is a mode in which kubelet uses
	// watches to observe changes to objects that are in its interest.
	WatchChangeDetectionStrategy ResourceChangeDetectionStrategy = "Watch"

and we default to WatchChangeDetectionStrategy here

kubernetes/pkg/kubelet/apis/config/v1beta1/defaults.go

Line 249 in 5a121aa

    
           obj.ConfigMapAndSecretChangeDetectionStrategy = kubeletconfigv1beta1.WatchChangeDetectionStrategy

It may happen that under resource contention, WatchChangeDetectionStrategy may not be able to detect the changes in the object of interest and then fetch the updated secrets fast enough. However, IMO, this strategy works great under normal operational circumstances. It avoids unnecessary trips to API server and hence is a bit more efficient.

Hence, instead of changing the default from WatchChangeDetectionStrategy to GetChangeDetectionStrategy for everyone (and potentially going away from existing default efficient secret fetching strategy as well as surprising the users with higher resource usage on their API server) we can modify our documentation to add a footnote, something like,

configMapAndSecretChangeDetectionStrategy
[ResourceChangeDetectionStrategy](https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/#kubelet-config-k8s-io-v1beta1-ResourceChangeDetectionStrategy)	
configMapAndSecretChangeDetectionStrategy is a mode in which ConfigMap and Secret managers are running. Valid values include:

Get: kubelet fetches necessary objects directly from the API server;
Cache: kubelet uses TTL cache for object fetched from the API server;
Watch: kubelet uses watches to observe changes to objects that are in its interest.

Default: "Watch"

Note: For cluster nodes experiencing constantly higher resource consumption,  GetChangeDetectionStrategy is the recommended strategy.

@liggitt @rphillips @haircommander @V0idk What do you think?

liggitt · 2024-05-23T18:35:39Z

"this works well until it doesn't under load" is a really unactionable thing to document... I think there has to be a "happens before" ordering people can reason about and depend on ... if I update a secret, then create a new pod, and my updated secret doesn't get used, that's ~unusable in my opinion. I didn't know we gave up that ordering with the watch strategy.

V0idk · 2024-05-24T16:18:24Z

My idea is:

The watch strategy is used to reduce the apiserver pressure when kubelet hot updating volumes.
But when creating a pod, we need to access the apiserver to obtain the latest data.

This ensures the ordering without losing too much performance.

pisto · 2024-05-27T14:36:56Z

I believe that this issue is particularly evident in kubernetes/website#42359 (comment) (immutable: true on the configmap/secret resource exacerbates this issue). IMO an unified solution that ensures reasonable assumptions about time ordering is essential. In other words, I expect that the surefire process to get pods with up-to-date values should be

modify a secret/configmap
rollout the workloads that use it.

If that was assured, it becomes a matter of synchronization on the instrument being used to create/update resources on kubernetes (helm hooks, argo synch waves, ...). Otherwise, automated rollouts can get stuck because the new pod may not get up to date values for e.g. environment variables, and since container crashes may not trigger a container recreation, the invalid value stays and the rollout cannot advance.

liggitt · 2024-05-27T14:45:27Z

immutable: true on the configmap/secret resource exacerbates this issue

to be fair, changing a resource marked immutable is not how that feature is expected to be used

wojtek-t · 2024-05-28T08:05:01Z

Catching up....

It was years since we did that and I clearly missed the invalidation aspect. I agree this is a bug, but at least not a regression.

The question is how/when we should fix that. Given we're using reflector, which is "eventually consistent" by definition, starts at any point in time anyway and changing that to force from consistent list now would break scalability in many places, I would be afraid of breaking it.

Especially that we have a solution for that around the corner. Streaming LISTS will actually allow us address that clearly:
https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/3157-watch-list/README.md

With streaming lists, we can change reflector to always start with consistent list, because it will be served from watchcache. And that's what would effectively address that.

So what I would do is to:

change the code now force reflector to restart [we can do that now]
wait until streaming lists land [we will change reflector at that point too to always start with consistent streaming list] and that will fix the issue

pisto · 2024-05-28T08:21:54Z

Can you clarify on the "eventually consistent" bit? We have observed a rather prolonged period (at least one day but we suspect much longer) in which a kubelet was not updating the cached data of a secret.

liggitt · 2024-05-28T14:32:44Z

I agree this is a bug, but at least not a regression.

It seems like it is... before the watch strategy, there was a happens-before relationship between secret/configmap writes and pod creation.

wojtek-t · 2024-05-29T09:30:02Z

It seems like it is... before the watch strategy, there was a happens-before relationship between secret/configmap writes and pod creation.

Well, technically true, but we enabled that in 1.14 (so over 5y ago) and we didn't realize until now (and unless I'm missing something we haven't heard any complaints about it). So it's not something that we found because someone upgraded their cluster recently from a version that behaved differently.

f2a0c03

haircommander · 2024-05-29T17:32:09Z

/triage accepted

V0idk added the kind/bug Categorizes issue or PR as related to a bug. label May 6, 2024

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 6, 2024

V0idk changed the title ~~secret/configmap should alway get lastest resource in pod add event~~ kubelet should alway get lastest secret/configmap resource in pod add event May 6, 2024

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 6, 2024

pacoxu added this to Triage in SIG Node Bugs May 14, 2024

k8s-ci-robot assigned harche May 22, 2024

k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label May 22, 2024

haircommander moved this from Triage to Triaged in SIG Node Bugs May 29, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet should alway get lastest secret/configmap resource in pod add event #124701

kubelet should alway get lastest secret/configmap resource in pod add event #124701

V0idk commented May 6, 2024 •

edited

V0idk commented May 6, 2024

k8s-ci-robot commented May 6, 2024

vaibhav2107 commented May 6, 2024

haircommander commented May 22, 2024

haircommander commented May 22, 2024

liggitt commented May 22, 2024

harche commented May 23, 2024

harche commented May 23, 2024

liggitt commented May 23, 2024

V0idk commented May 24, 2024 •

edited

pisto commented May 27, 2024

liggitt commented May 27, 2024

wojtek-t commented May 28, 2024

pisto commented May 28, 2024

liggitt commented May 28, 2024

wojtek-t commented May 29, 2024 •

edited

haircommander commented May 29, 2024

kubelet should alway get lastest secret/configmap resource in pod add event #124701

kubelet should alway get lastest secret/configmap resource in pod add event #124701

Comments

V0idk commented May 6, 2024 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

V0idk commented May 6, 2024

k8s-ci-robot commented May 6, 2024

vaibhav2107 commented May 6, 2024

haircommander commented May 22, 2024

haircommander commented May 22, 2024

liggitt commented May 22, 2024

harche commented May 23, 2024

harche commented May 23, 2024

liggitt commented May 23, 2024

V0idk commented May 24, 2024 • edited

pisto commented May 27, 2024

liggitt commented May 27, 2024

wojtek-t commented May 28, 2024

pisto commented May 28, 2024

liggitt commented May 28, 2024

wojtek-t commented May 29, 2024 • edited

haircommander commented May 29, 2024

V0idk commented May 6, 2024 •

edited

V0idk commented May 24, 2024 •

edited

wojtek-t commented May 29, 2024 •

edited