scanner: Scan buckets asynchronously #18107

vadmeste · 2023-09-25T23:07:36Z

Community Contribution License

All community contributions in this pull request are licensed to the project maintainers
under the terms of the [Apache 2 license] (https://www.apache.org/licenses/LICENSE-2.0).
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.

Description

Scan buckets asynchrounously; multiple erasure sets scan in parallel, multiple disks in the same erasure set scan different buckets in parallel
No data format is changed
Cycle concept changed to be per bucket focused, the cycle is incremented ecah time a bucket is successfully or unsuccesfully scanned
Next bucket to scan is chosen based on the lowest possible cycle number

Motivation and Context

How to test this PR?

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Optimization (provides speedup with no functional changes)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

Fixes a regression (If yes, please add commit-id or PR # here)
Unit tests added/updated
Internal documentation updated
Create a documentation update request here

harshavardhana

do you know how far this improves to share? @vadmeste

cmd/erasure.go

klauspost · 2023-09-27T10:44:12Z

@vadmeste Could you describe the changes in more detail?

vadmeste · 2023-09-30T23:33:12Z

@vadmeste Could you describe the changes in more detail?

So basically here, I tried to keep the change as minimal as possible; and I did not need to do any data format change as well;

So we will use everything as usual; the total usage; the data cache per erasure set; the data cache per bucket per erasure set; This should not be a problem even if we scan buckets in parallel; because after all, we will still scan one bucket per erasure set at a time, this should not cause any confusion in calculation at all.

The differences from the old code:

Each bucket in each erasure set has its own scan cycle
A bucket scan manager decides what is the next bucket to scan for a given erasure set
erasure-server-pools.NSScanner call (and lower calls) will not return anymore, and will continously scan for buckets

Let's chat while you review this PR if you want

harshavardhana · 2023-12-30T06:57:31Z

I have rebased and pushed the changes @vadmeste

cmd/data-scanner-metric.go

shtripat

looks good to me

shtripat · 2024-01-03T05:47:03Z

cmd/buckets-scan-mgr.go

+	return nextBucketName
+}
+
+// Mark a bucket as done in a specific erasure set - returns true if successful,


%s/Mark a bucket as done in a specific erasure set/Mark a bucket as scan started in a specific erasure set/g

harshavardhana · 2024-02-29T00:46:24Z

@vadmeste please rebase this and let the team know what is waiting on us for merging this PR.

harshavardhana · 2024-03-11T12:12:27Z

PTAL @klauspost need your review here

Is there any breakage in mc admin scanner status @vadmeste ?

harshavardhana · 2024-03-24T10:04:28Z

@vadmeste read the code again. Why do we need a change in data structure and why madmin-go needs to break here ?

vadmeste · 2024-03-25T13:08:05Z

@vadmeste read the code again. Why do we need a change in data structure and why madmin-go needs to break here ?

Basically after implementing the asynchronous bucket scanning, a cluster scan cycle will lose its meaning. The cycle will be per bucket per erasure set instead:

           (pool, set, bucket-name) => cycle

By the way, madmin-go wil still work with older MinIO deployments, later mc will show the approriate UI. If the cluster cycle information is available, then we know this is an old version and

- Scan buckets in all erasure sets asynchrounously - No data format is changed - Cycle concept moved to be bucket centric, the cycle is incremented ecah time a bucket is successfully or unsuccesfully scanned - Next bucket in each erasure set is chosen based on the oldest last scan timestamp

poornas · 2024-05-16T19:29:05Z

cmd/buckets-scan-mgr.go

@@ -0,0 +1,262 @@
+// Copyright (c) 2015-2023 MinIO, Inc.


Suggested change

// Copyright (c) 2015-2023 MinIO, Inc.

// Copyright (c) 2015-2024 MinIO, Inc.

vadmeste force-pushed the scanner-v2 branch from 3e203cd to 0fc853d Compare September 25, 2023 23:09

harshavardhana reviewed Sep 25, 2023

View reviewed changes

cmd/erasure.go Outdated Show resolved Hide resolved

vadmeste mentioned this pull request Sep 30, 2023

scanner: Allow full throttle if there is no parallel disk ops #18109

Merged

8 tasks

vadmeste force-pushed the scanner-v2 branch from 0fc853d to 904bdd7 Compare September 30, 2023 23:33

vadmeste force-pushed the scanner-v2 branch 2 times, most recently from 3ef62f4 to bdeb510 Compare November 3, 2023 00:07

vadmeste force-pushed the scanner-v2 branch 2 times, most recently from 2bd854f to 6219622 Compare November 22, 2023 18:33

vadmeste marked this pull request as ready for review November 22, 2023 18:36

harshavardhana force-pushed the scanner-v2 branch from 6219622 to e577c3e Compare December 30, 2023 06:32

harshavardhana requested a review from klauspost December 30, 2023 06:33

harshavardhana requested review from poornas, krisis and shtripat December 30, 2023 06:57

harshavardhana added new-feature next-release scheduled for upcoming release priority: high needs-docs Use this label to mark a PR that requires updating the web documentation labels Dec 30, 2023

shtripat reviewed Jan 2, 2024

View reviewed changes

cmd/data-scanner-metric.go Outdated Show resolved Hide resolved

cmd/data-scanner-metric.go Outdated Show resolved Hide resolved

cmd/data-scanner-metric.go Outdated Show resolved Hide resolved

vadmeste force-pushed the scanner-v2 branch 2 times, most recently from 5f20b78 to 28d3a8e Compare January 3, 2024 04:19

shtripat approved these changes Jan 3, 2024

View reviewed changes

harshavardhana added priority: medium needs-review and removed priority: high needs-docs Use this label to mark a PR that requires updating the web documentation new-feature labels Jan 18, 2024

harshavardhana removed the next-release scheduled for upcoming release label Feb 6, 2024

vadmeste force-pushed the scanner-v2 branch 3 times, most recently from 793a11c to 936363e Compare March 11, 2024 11:05

vadmeste force-pushed the scanner-v2 branch from 936363e to ced5c5a Compare March 29, 2024 11:29

vadmeste force-pushed the scanner-v2 branch from ced5c5a to 4f93832 Compare May 15, 2024 11:42

poornas reviewed May 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scanner: Scan buckets asynchronously #18107

scanner: Scan buckets asynchronously #18107

vadmeste commented Sep 25, 2023

harshavardhana left a comment

klauspost commented Sep 27, 2023 •

edited

vadmeste commented Sep 30, 2023

harshavardhana commented Dec 30, 2023

shtripat left a comment

shtripat Jan 3, 2024

harshavardhana commented Feb 29, 2024

harshavardhana commented Mar 11, 2024 •

edited

harshavardhana commented Mar 24, 2024

vadmeste commented Mar 25, 2024

poornas May 16, 2024

	// Copyright (c) 2015-2023 MinIO, Inc.
	// Copyright (c) 2015-2024 MinIO, Inc.

scanner: Scan buckets asynchronously #18107

Are you sure you want to change the base?

scanner: Scan buckets asynchronously #18107

Conversation

vadmeste commented Sep 25, 2023

Community Contribution License

Description

Motivation and Context

How to test this PR?

Types of changes

Checklist:

harshavardhana left a comment

Choose a reason for hiding this comment

klauspost commented Sep 27, 2023 • edited

vadmeste commented Sep 30, 2023

harshavardhana commented Dec 30, 2023

shtripat left a comment

Choose a reason for hiding this comment

shtripat Jan 3, 2024

Choose a reason for hiding this comment

harshavardhana commented Feb 29, 2024

harshavardhana commented Mar 11, 2024 • edited

harshavardhana commented Mar 24, 2024

vadmeste commented Mar 25, 2024

poornas May 16, 2024

Choose a reason for hiding this comment

klauspost commented Sep 27, 2023 •

edited

harshavardhana commented Mar 11, 2024 •

edited