Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scanner: Scan buckets asynchronously #18107

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vadmeste
Copy link
Member

Community Contribution License

All community contributions in this pull request are licensed to the project maintainers
under the terms of the [Apache 2 license] (https://www.apache.org/licenses/LICENSE-2.0).
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.

Description

  • Scan buckets asynchrounously; multiple erasure sets scan in parallel, multiple disks in the same erasure set scan different buckets in parallel
  • No data format is changed
  • Cycle concept changed to be per bucket focused, the cycle is incremented ecah time a bucket is successfully or unsuccesfully scanned
  • Next bucket to scan is chosen based on the lowest possible cycle number

Motivation and Context

How to test this PR?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Optimization (provides speedup with no functional changes)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Fixes a regression (If yes, please add commit-id or PR # here)
  • Unit tests added/updated
  • Internal documentation updated
  • Create a documentation update request here

Copy link
Member

@harshavardhana harshavardhana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know how far this improves to share? @vadmeste

cmd/erasure.go Outdated Show resolved Hide resolved
@klauspost
Copy link
Contributor

klauspost commented Sep 27, 2023

@vadmeste Could you describe the changes in more detail?

@vadmeste
Copy link
Member Author

@vadmeste Could you describe the changes in more detail?

So basically here, I tried to keep the change as minimal as possible; and I did not need to do any data format change as well;

So we will use everything as usual; the total usage; the data cache per erasure set; the data cache per bucket per erasure set; This should not be a problem even if we scan buckets in parallel; because after all, we will still scan one bucket per erasure set at a time, this should not cause any confusion in calculation at all.

The differences from the old code:

  • Each bucket in each erasure set has its own scan cycle
  • A bucket scan manager decides what is the next bucket to scan for a given erasure set
  • erasure-server-pools.NSScanner call (and lower calls) will not return anymore, and will continously scan for buckets

Let's chat while you review this PR if you want

@harshavardhana
Copy link
Member

I have rebased and pushed the changes @vadmeste

@harshavardhana harshavardhana added new-feature next-release scheduled for upcoming release priority: high needs-docs Use this label to mark a PR that requires updating the web documentation labels Dec 30, 2023
cmd/data-scanner-metric.go Outdated Show resolved Hide resolved
cmd/data-scanner-metric.go Outdated Show resolved Hide resolved
cmd/data-scanner-metric.go Outdated Show resolved Hide resolved
@vadmeste vadmeste force-pushed the scanner-v2 branch 2 times, most recently from 5f20b78 to 28d3a8e Compare January 3, 2024 04:19
Copy link
Contributor

@shtripat shtripat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

return nextBucketName
}

// Mark a bucket as done in a specific erasure set - returns true if successful,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%s/Mark a bucket as done in a specific erasure set/Mark a bucket as scan started in a specific erasure set/g

@harshavardhana harshavardhana added priority: medium needs-review and removed priority: high needs-docs Use this label to mark a PR that requires updating the web documentation new-feature labels Jan 18, 2024
@harshavardhana harshavardhana removed the next-release scheduled for upcoming release label Feb 6, 2024
@harshavardhana
Copy link
Member

@vadmeste please rebase this and let the team know what is waiting on us for merging this PR.

@vadmeste vadmeste force-pushed the scanner-v2 branch 3 times, most recently from 793a11c to 936363e Compare March 11, 2024 11:05
@harshavardhana
Copy link
Member

harshavardhana commented Mar 11, 2024

PTAL @klauspost need your review here

Is there any breakage in mc admin scanner status @vadmeste ?

@harshavardhana
Copy link
Member

@vadmeste read the code again. Why do we need a change in data structure and why madmin-go needs to break here ?

@vadmeste
Copy link
Member Author

@vadmeste read the code again. Why do we need a change in data structure and why madmin-go needs to break here ?

Basically after implementing the asynchronous bucket scanning, a cluster scan cycle will lose its meaning. The cycle will be per bucket per erasure set instead:

           (pool, set, bucket-name) => cycle

By the way, madmin-go wil still work with older MinIO deployments, later mc will show the approriate UI. If the cluster cycle information is available, then we know this is an old version and

- Scan buckets in all erasure sets asynchrounously
- No data format is changed
- Cycle concept moved to be bucket centric, the cycle is
  incremented ecah time a bucket is successfully or unsuccesfully
  scanned
- Next bucket in each erasure set is chosen based on the oldest
  last scan timestamp
@@ -0,0 +1,262 @@
// Copyright (c) 2015-2023 MinIO, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (c) 2015-2023 MinIO, Inc.
// Copyright (c) 2015-2024 MinIO, Inc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants