New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sufficient deadlines and countermeasures to handle hung node scenario #19688
base: master
Are you sure you want to change the base?
Conversation
351b2af
to
5f9fec9
Compare
118c7b2
to
3d473d9
Compare
01aab05
to
23766c5
Compare
peersLogOnceIf(context.Background(), err, nodeName) | ||
if xnet.IsNetworkOrHostDown(err, false) { | ||
network[nodeName] = string(madmin.ItemOffline) | ||
} else if xnet.IsNetworkOrHostDown(err, true) { | ||
network[nodeName] = "connection attempt timedout" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To differentiate between actually "offline v/s timedout"
cli.DurationFlag{ | ||
Name: "conn-client-read-deadline", | ||
Usage: "custom connection READ deadline for incoming requests", | ||
Hidden: true, | ||
EnvVar: "MINIO_CONN_CLIENT_READ_DEADLINE", | ||
}, | ||
cli.DurationFlag{ | ||
Name: "conn-client-write-deadline", | ||
Usage: "custom connection WRITE deadline for outgoing requests", | ||
Hidden: true, | ||
EnvVar: "MINIO_CONN_CLIENT_WRITE_DEADLINE", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed remove it.
gridLogIf(ctx, fmt.Errorf("ws write: %w", err)) | ||
if !xnet.IsNetworkOrHostDown(err, true) { | ||
gridLogIf(ctx, fmt.Errorf("ws write: %w", err)) | ||
} | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply valid deadlines and only unexpected logs, not repeated network errors.
23766c5
to
6998367
Compare
12a1a10
to
80daf94
Compare
80daf94
to
08e342f
Compare
…ario Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>
08e342f
to
b6ced9c
Compare
err = conn.SetWriteDeadline(time.Now().Add(connWriteTimeout)) | ||
if err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed?
Community Contribution License
All community contributions in this pull request are licensed to the project maintainers
under the terms of the Apache 2 license.
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.
Description
Add sufficient deadlines and countermeasures to handle the hung node scenario
Motivation and Context
This PR tries to address the hung node scenario by adding sufficient
deadlines and counter measures for such an eventuality.
How to test this PR?
This PR already adds the relevant tests; you may reproduce them
locally as needed by following the mint automation piece.
Types of changes
Checklist:
commit-id
orPR #
here)