New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scanner: Scan buckets asynchronously #18107
base: master
Are you sure you want to change the base?
Conversation
3e203cd
to
0fc853d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know how far this improves to share? @vadmeste
@vadmeste Could you describe the changes in more detail? |
So basically here, I tried to keep the change as minimal as possible; and I did not need to do any data format change as well; So we will use everything as usual; the total usage; the data cache per erasure set; the data cache per bucket per erasure set; This should not be a problem even if we scan buckets in parallel; because after all, we will still scan one bucket per erasure set at a time, this should not cause any confusion in calculation at all. The differences from the old code:
Let's chat while you review this PR if you want |
0fc853d
to
904bdd7
Compare
3ef62f4
to
bdeb510
Compare
2bd854f
to
6219622
Compare
6219622
to
e577c3e
Compare
I have rebased and pushed the changes @vadmeste |
5f20b78
to
28d3a8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
return nextBucketName | ||
} | ||
|
||
// Mark a bucket as done in a specific erasure set - returns true if successful, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%s/Mark a bucket as done in a specific erasure set/Mark a bucket as scan started in a specific erasure set/g
@vadmeste please rebase this and let the team know what is waiting on us for merging this PR. |
793a11c
to
936363e
Compare
PTAL @klauspost need your review here Is there any breakage in mc admin scanner status @vadmeste ? |
@vadmeste read the code again. Why do we need a change in data structure and why madmin-go needs to break here ? |
Basically after implementing the asynchronous bucket scanning, a cluster scan cycle will lose its meaning. The cycle will be per bucket per erasure set instead:
By the way, madmin-go wil still work with older MinIO deployments, later mc will show the approriate UI. If the cluster cycle information is available, then we know this is an old version and |
- Scan buckets in all erasure sets asynchrounously - No data format is changed - Cycle concept moved to be bucket centric, the cycle is incremented ecah time a bucket is successfully or unsuccesfully scanned - Next bucket in each erasure set is chosen based on the oldest last scan timestamp
@@ -0,0 +1,262 @@ | |||
// Copyright (c) 2015-2023 MinIO, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Copyright (c) 2015-2023 MinIO, Inc. | |
// Copyright (c) 2015-2024 MinIO, Inc. |
Community Contribution License
All community contributions in this pull request are licensed to the project maintainers
under the terms of the [Apache 2 license] (https://www.apache.org/licenses/LICENSE-2.0).
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.
Description
Motivation and Context
How to test this PR?
Types of changes
Checklist:
commit-id
orPR #
here)