Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add randomness in robustness cluster process version to test mixed version scenarios. #17923

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

siyuanfoundation
Copy link
Contributor

@siyuanfoundation siyuanfoundation commented May 1, 2024

#17097

tested with

GO_TEST_FLAGS='--timeout=500m --count=200 --failfast' time make test-robustness
GO_TEST_FLAGS='--timeout=500m --count=200 --failfast' time make test-robustness-release-3.6
GO_TEST_FLAGS='--timeout=500m --count=200 --failfast' time make test-robustness-release-3.5

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@k8s-ci-robot
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@henrybear327
Copy link
Contributor

/cc @henrybear327

@k8s-ci-robot
Copy link

@henrybear327: GitHub didn't allow me to request PR reviews from the following users: henrybear327.

Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @henrybear327

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@siyuanfoundation siyuanfoundation changed the title [WIP] Add randomness in robustness cluster process version to test mixed version scenarios. Add randomness in robustness cluster process version to test mixed version scenarios. May 17, 2024
@siyuanfoundation siyuanfoundation marked this pull request as ready for review May 17, 2024 15:17
@@ -74,13 +74,6 @@ func TestEtcdServerProcessConfig(t *testing.T) {
"--experimental-snapshot-catchup-entries=100",
},
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this? Can we adapt the test?

@@ -508,3 +508,18 @@ func GetVersionFromBinary(binaryPath string) (*semver.Version, error) {

return nil, fmt.Errorf("could not find version in binary output of %s, lines outputted were %v", binaryPath, lines)
}

func CouldSetSnapshotCatchupEntries(execPath string) bool {
if !fileutil.Exist(execPath) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mock it in unit test?

@@ -59,7 +59,7 @@ func (tb triggerBlackhole) Trigger(ctx context.Context, t *testing.T, member e2e
func (tb triggerBlackhole) Available(config e2e.EtcdProcessClusterConfig, process e2e.EtcdProcess) bool {
// Avoid triggering failpoint if waiting for failpoint would take too long to fit into timeout.
// Number of required entries for snapshot depends on etcd configuration.
if tb.waitTillSnapshot && entriesToGuaranteeSnapshot(config) > 200 {
if tb.waitTillSnapshot && (entriesToGuaranteeSnapshot(config) > 200 || !e2e.CouldSetSnapshotCatchupEntries(process.Config().ExecPath)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't like why this is needed. With your change allowing random member versions, snapshot-catchup-entries is configured on cluster, but not nesseserly every member. Only on members that version supports it. We test this edge case, however I don't like the fact it allows snapshot-catchup-entries to be now silently ignored.

Still, it's ok for now as I don't know a nice solution for this.

Copy link
Member

@serathius serathius May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed to add a validation in other comment. Please take a look.

@@ -106,6 +120,7 @@ $(GOPATH)/bin/gofail: tools/mod/go.mod tools/mod/go.sum
go get go.etcd.io/gofail@${GOFAIL_VERSION}; \
FAILPOINTS=true ./build;

.PHONY: /tmp/etcd-release-3.4-failpoints/bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a PHONY target anymore.

@@ -613,6 +603,27 @@ func (cfg *EtcdProcessClusterConfig) EtcdServerProcessConfig(tb testing.TB, i in
panic(fmt.Sprintf("Unknown cluster version %v", cfg.Version))
}

defaultValues := values(*embed.NewConfig())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a validation for the edge case that non of version is supports overriding "experimental-snapshot-catchup-entries" but "experimental-snapshot-catchup-entries" is set?

@serathius
Copy link
Member

This change is really really good, makes me excited that we are able to test mixed version clusters now, great job!

@@ -74,16 +69,29 @@ func exploratoryScenarios(t *testing.T) []testScenario {
options.ClusterOptions{options.WithTickMs(100), options.WithElectionMs(2000)}),
}

// 66% current version, 33% MinorityLastVersion and QuorumLastVersion
mixedVersionOption := options.WithClusterOptionGroups(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a followup, it would be nice to have a weighted option, like in traffic.

@@ -74,16 +69,29 @@ func exploratoryScenarios(t *testing.T) []testScenario {
options.ClusterOptions{options.WithTickMs(100), options.WithElectionMs(2000)}),
}

// 66% current version, 33% MinorityLastVersion and QuorumLastVersion
mixedVersionOption := options.WithClusterOptionGroups(
// 60% with all members of current version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a followup, it would be nice to have a weighted option, like in traffic.

Signed-off-by: Siyuan Zhang <sizhang@google.com>
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants