New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add randomness in robustness cluster process version to test mixed version scenarios. #17923
base: main
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
f18395e
to
d66961f
Compare
247e432
to
d2af147
Compare
/cc @henrybear327 |
@henrybear327: GitHub didn't allow me to request PR reviews from the following users: henrybear327. Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
e7098de
to
62da593
Compare
bf78d07
to
01d793d
Compare
@@ -74,13 +74,6 @@ func TestEtcdServerProcessConfig(t *testing.T) { | |||
"--experimental-snapshot-catchup-entries=100", | |||
}, | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this? Can we adapt the test?
@@ -508,3 +508,18 @@ func GetVersionFromBinary(binaryPath string) (*semver.Version, error) { | |||
|
|||
return nil, fmt.Errorf("could not find version in binary output of %s, lines outputted were %v", binaryPath, lines) | |||
} | |||
|
|||
func CouldSetSnapshotCatchupEntries(execPath string) bool { | |||
if !fileutil.Exist(execPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mock it in unit test?
@@ -59,7 +59,7 @@ func (tb triggerBlackhole) Trigger(ctx context.Context, t *testing.T, member e2e | |||
func (tb triggerBlackhole) Available(config e2e.EtcdProcessClusterConfig, process e2e.EtcdProcess) bool { | |||
// Avoid triggering failpoint if waiting for failpoint would take too long to fit into timeout. | |||
// Number of required entries for snapshot depends on etcd configuration. | |||
if tb.waitTillSnapshot && entriesToGuaranteeSnapshot(config) > 200 { | |||
if tb.waitTillSnapshot && (entriesToGuaranteeSnapshot(config) > 200 || !e2e.CouldSetSnapshotCatchupEntries(process.Config().ExecPath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't like why this is needed. With your change allowing random member versions, snapshot-catchup-entries is configured on cluster, but not nesseserly every member. Only on members that version supports it. We test this edge case, however I don't like the fact it allows snapshot-catchup-entries
to be now silently ignored.
Still, it's ok for now as I don't know a nice solution for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed to add a validation in other comment. Please take a look.
@@ -106,6 +120,7 @@ $(GOPATH)/bin/gofail: tools/mod/go.mod tools/mod/go.sum | |||
go get go.etcd.io/gofail@${GOFAIL_VERSION}; \ | |||
FAILPOINTS=true ./build; | |||
|
|||
.PHONY: /tmp/etcd-release-3.4-failpoints/bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a PHONY target anymore.
@@ -613,6 +603,27 @@ func (cfg *EtcdProcessClusterConfig) EtcdServerProcessConfig(tb testing.TB, i in | |||
panic(fmt.Sprintf("Unknown cluster version %v", cfg.Version)) | |||
} | |||
|
|||
defaultValues := values(*embed.NewConfig()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a validation for the edge case that non of version is supports overriding "experimental-snapshot-catchup-entries" but "experimental-snapshot-catchup-entries" is set?
This change is really really good, makes me excited that we are able to test mixed version clusters now, great job! |
@@ -74,16 +69,29 @@ func exploratoryScenarios(t *testing.T) []testScenario { | |||
options.ClusterOptions{options.WithTickMs(100), options.WithElectionMs(2000)}), | |||
} | |||
|
|||
// 66% current version, 33% MinorityLastVersion and QuorumLastVersion | |||
mixedVersionOption := options.WithClusterOptionGroups( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a followup, it would be nice to have a weighted option, like in traffic.
@@ -74,16 +69,29 @@ func exploratoryScenarios(t *testing.T) []testScenario { | |||
options.ClusterOptions{options.WithTickMs(100), options.WithElectionMs(2000)}), | |||
} | |||
|
|||
// 66% current version, 33% MinorityLastVersion and QuorumLastVersion | |||
mixedVersionOption := options.WithClusterOptionGroups( | |||
// 60% with all members of current version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a followup, it would be nice to have a weighted option, like in traffic.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Signed-off-by: Siyuan Zhang <sizhang@google.com>
#17097
tested with
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.