storage/cacher: dispatchEvents use progressRequester #124754

p0lyn0mial · 2024-05-08T13:45:09Z

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2024-05-08T13:45:18Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

p0lyn0mial · 2024-05-08T13:46:23Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go

+	// TODO(p0lyn0mial): adapt the following logic once
+	//  https://github.com/kubernetes/kubernetes/pull/124612 merges
+	progressRequesterCleanUpOnceFn := func() { /*no-op*/ }
+	if utilfeature.DefaultFeatureGate.Enabled(features.ConsistentListFromCache) && utilfeature.DefaultFeatureGate.Enabled(features.WatchList) {


maybe we should actually gate the progressRequester only when the etcd version "matches" ?

Yes - we should gate on DefaultFeatureSupportsChecker once that merges.
The problem there is that it may not yet be initialized... and we need to handle that case too, so it's a bit more tricky (because we are not able to differentiate between not-initialized and not-supports really).
@serathius - FYI

[On a related note, we should probably reject streaming list requests if progress-requester is not supported.

because we are not able to differentiate between not-initialized and not-supports really

why ? It is not supported when the version of etcd doesn't match.

On a related note, we should probably reject streaming list requests if progress-requester is not supported.

yeah, ideally if we could add etcdVersionChecker to

kubernetes/staging/src/k8s.io/apimachinery/pkg/apis/meta/internalversion/validation/validation.go

Line 28 in 2ae115e

func ValidateListOptions(options *internalversion.ListOptions, isWatchListFeatureEnabled bool) field.ErrorList {

could the etcdVersionChecker gate the server readiness until it initialises ?

why ? It is not supported when the version of etcd doesn't match.

etcd can be started after kube-apiserver, so we default to false and let initialization switch it

could the etcdVersionChecker gate the server readiness until it initialises ?

no - because we don't really know when it is fully initialized...

etcd can be started after kube-apiserver, so we default to false and let initialization switch it

In that case the server won't be ready until newETCD3Check turns green. We could create something similar for the version checker.

p0lyn0mial · 2024-05-08T13:50:01Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

@@ -2470,6 +2470,72 @@ func TestWatchStreamSeparation(t *testing.T) {
 	}
 }

+func TestDispatchEventsUseProgressRequester(t *testing.T) {


The test works but can start flaking on "timing" issues. I like it because it test the entire watch request.

p0lyn0mial · 2024-05-08T13:51:27Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

+	require.NoError(t, err, "failed to create watch: %v")
+	testCheckNoEvents(t, w)
+	w.Stop()
+	storeWatchProgressCounterValueAfterFirstWatch := backingStorage.getRequestWatchProgressCounter()


maybe before stopping the watch we should loop until the counter will be > 0 ? That could deflake the test

p0lyn0mial · 2024-05-08T13:52:06Z

/assign @wojtek-t

p0lyn0mial · 2024-05-08T15:52:52Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go

+	//  https://github.com/kubernetes/kubernetes/pull/124612 merges
+	progressRequesterCleanUpOnceFn := func() { /*no-op*/ }
+	if utilfeature.DefaultFeatureGate.Enabled(features.ConsistentListFromCache) && utilfeature.DefaultFeatureGate.Enabled(features.WatchList) {
+		progressRequester.Add()


I have just also realised than we need to slightly change the progressRequester so that it is able to send periodic progress updates. This will unblock watchers initialised from the global RV against resources that haven't received any changes/updates.

I can try changing the progressRequester for that purpose.

The question is what should we do when the etcd was started with the progress notification flag ?

actually, no, this will be handled by

kubernetes/staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go

Line 1317 in b0abfde

c.watchCache.waitingUntilFresh.Add()

wojtek-t · 2024-05-10T10:53:15Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go

+		progressRequester.Add()
+		progressRequesterCleanUpOnceFn = sync.OnceFunc(progressRequester.Remove)
+	}
+


So with all discussion on the other PRs, I think it will actually be much easier to proceed with a different approach (the one that I thought about originally).

Basically, instead of even requesting bookmarks, ensure that we initialize lastProcessedResourceVersion correctly from the beginning.
So the flow we want to achieve is that lastProcessedResourceVersion will actually be set when the first List call is done.

Now, synchronizing that correctly in arbitrary way is a bit tricky, but we have a pretty simple path to achieve it.

What you just need to do is very simple thing, just add here:

lastProcessedResourceVersion := uint64(0) PollUntil(10ms, func() (bool, error) { if rv := c.watchCache.GetResourceVersion(); rv != 0 { lastProcessedResourceVersion = rv return true, nil } return false, nil }

That solves the whole problem - you don't need any other changes in this PR.

ok, i like the idea it will be detached from the progressRequester. thanks!

fedebongio · 2024-05-14T20:19:13Z

/remove-sig api-machinery

wojtek-t · 2024-05-20T07:55:23Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go

+	// The cache must wait until this first sync is completed to be deemed ready.
+	// Since we cannot send a bookmark when the lastProcessedResourceVersion is 0,
+	// we poll aggressively for the first RV before entering the dispatch loop.
+	if err := c.ready.wait(wait.ContextForChannel(c.stopCh)); err != nil {


Why do we need it? I suggest removing it.

We have an extremely tight loop below.

Before hammering the CPU, we should check if the cacher has been synchronised so that we can read the current value of the resource version.

the calls in that function (GetResourceVersion) are pretty cheap, so I wouldn't unnecessarily complicate it

What is your concern here? Does calling a well-known function really make this code more complicated ?:)
There is no point in calling getResourceVersion before the cacher synchronies.

Not-readiness can happen later too, if the cache unsynchronizes later it can block again.
But primarily - complexity is my concern. This code is already super complicated and we need to find ways for making it simpler.

Calling getResourceVersion() is really cheap and I don't bother calling it every 10ms until it initializes - the cost of doing that is negligible compared to initializations itself anyway.

Not-readiness can happen later too, if the cache unsynchronizes later it can block again.

we call it only once and then we enter the for loop from which we never exit (unless the stopCh was closed)

But primarily - complexity is my concern. This code is already super complicated and we need to find ways for making it simpler.

OK, pushed, PTAL.

wojtek-t · 2024-05-20T07:56:12Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/watch_cache.go

@@ -641,6 +641,12 @@ func (w *watchCache) Resync() error {
 	return nil
 }

+func (w *watchCache) GetResourceVersion() uint64 {


nit: make it private function

wojtek-t · 2024-05-20T07:56:55Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go

+		}
+		return false, nil
+	}); err != nil {
+		return /*since it can only happen when the stopCh is closed*/


nit:

// Given the function above never returns error, it can happen only on stopCh being closed ```

wojtek-t · 2024-05-20T12:20:27Z

/kind feature

/lgtm
/approve

k8s-ci-robot · 2024-05-20T12:20:33Z

LGTM label has been added.

Git tree hash: a2c5781dceeeb6ff2dfb380c19bc8908d7fbf891

k8s-ci-robot · 2024-05-20T12:20:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/pkg/storage/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2024-05-20T12:47:08Z

@p0lyn0mial: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-e2e-kind-ipv6	`33f81ee`	link	true	`/test pull-kubernetes-e2e-kind-ipv6`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

p0lyn0mial · 2024-05-20T13:39:39Z

/test pull-kubernetes-e2e-kind-ipv6

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label May 8, 2024

k8s-ci-robot requested review from ncdc and stevekuznetsov May 8, 2024 13:45

k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 8, 2024

p0lyn0mial commented May 8, 2024

View reviewed changes

k8s-ci-robot assigned wojtek-t May 8, 2024

p0lyn0mial commented May 8, 2024

View reviewed changes

p0lyn0mial mentioned this pull request May 9, 2024

Feat: warn user if etcd version is not supported for RequestWatchProgress feature. #124612

Merged

p0lyn0mial force-pushed the upstream-cacher-dispatchevents-progress-requester branch from b0abfde to 5d2e035 Compare May 9, 2024 14:54

wojtek-t reviewed May 10, 2024

View reviewed changes

p0lyn0mial force-pushed the upstream-cacher-dispatchevents-progress-requester branch 3 times, most recently from fc80b5e to 064fe38 Compare May 14, 2024 10:19

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels May 14, 2024

wojtek-t reviewed May 20, 2024

View reviewed changes

p0lyn0mial force-pushed the upstream-cacher-dispatchevents-progress-requester branch from 064fe38 to 783cf82 Compare May 20, 2024 10:59

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 20, 2024

storage/cacher: dispatchEvents use progressRequester

33f81ee

p0lyn0mial force-pushed the upstream-cacher-dispatchevents-progress-requester branch from 783cf82 to 33f81ee Compare May 20, 2024 12:18

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels May 20, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage/cacher: dispatchEvents use progressRequester #124754

storage/cacher: dispatchEvents use progressRequester #124754

p0lyn0mial commented May 8, 2024

k8s-ci-robot commented May 8, 2024

p0lyn0mial May 8, 2024

wojtek-t May 8, 2024

p0lyn0mial May 8, 2024

wojtek-t May 8, 2024

p0lyn0mial May 8, 2024

p0lyn0mial May 8, 2024

p0lyn0mial May 8, 2024

p0lyn0mial commented May 8, 2024

p0lyn0mial May 8, 2024

p0lyn0mial May 8, 2024

p0lyn0mial May 8, 2024

wojtek-t May 10, 2024

p0lyn0mial May 10, 2024

fedebongio commented May 14, 2024

wojtek-t May 20, 2024

p0lyn0mial May 20, 2024

wojtek-t May 20, 2024

p0lyn0mial May 20, 2024

wojtek-t May 20, 2024

p0lyn0mial May 20, 2024

p0lyn0mial May 20, 2024

wojtek-t May 20, 2024

wojtek-t May 20, 2024

wojtek-t commented May 20, 2024

k8s-ci-robot commented May 20, 2024

k8s-ci-robot commented May 20, 2024

k8s-ci-robot commented May 20, 2024

p0lyn0mial commented May 20, 2024

storage/cacher: dispatchEvents use progressRequester #124754

Are you sure you want to change the base?

storage/cacher: dispatchEvents use progressRequester #124754

Conversation

p0lyn0mial commented May 8, 2024

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented May 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p0lyn0mial commented May 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fedebongio commented May 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t commented May 20, 2024

k8s-ci-robot commented May 20, 2024

k8s-ci-robot commented May 20, 2024

k8s-ci-robot commented May 20, 2024

p0lyn0mial commented May 20, 2024