Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch status API #19679

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add batch status API #19679

wants to merge 1 commit into from

Conversation

vadmeste
Copy link
Member

@vadmeste vadmeste commented May 6, 2024

Community Contribution License

All community contributions in this pull request are licensed to the project maintainers
under the terms of the Apache 2 license.
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.

Description

Currently the status of a completed or failed batch is held in the memory, a simple restart will lose the status and the user will not have any visibility of the job that was long running.

In addition to the metrics, add a new API that reads the batch status from the drives. A batch job will be cleaned up three days after completion.

Also add the batch type in the batch id, the reason is that the batch job request is removed immediately when the job is finished, then we do not know the type of batch job anymore, hence a difficulty to locate the job report

Motivation and Context

How to test this PR?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Optimization (provides speedup with no functional changes)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Fixes a regression (If yes, please add commit-id or PR # here)
  • Unit tests added/updated
  • Internal documentation updated
  • Create a documentation update request here

@klauspost
Copy link
Contributor

#19677 also related.

@vadmeste vadmeste force-pushed the batch-status branch 3 times, most recently from bbbd717 to 91d3a99 Compare May 7, 2024 14:12
@vadmeste vadmeste marked this pull request as ready for review May 7, 2024 14:37
@@ -57,6 +58,11 @@ import (

var globalBatchConfig batch.Config

const (
// Keep the completed/falied job stats 3 days before removing it
oldJobsExpiration = 3 * 24 * time.Hour
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 days? 24hr is enough I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshavardhana the batch job is a long running job, I doubt users will check the batch status everyday, then once the job batch completes or fails + 24 hours, there is no trace at all about what happened. Maybe we should not remove in the first place and add a new API to remove batch reports

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for that they must configure notifications, we shouldn't have to carry this around.

Can you check what AWS S3 does?

cmd/batch-handlers.go Outdated Show resolved Hide resolved
cmd/batch-handlers.go Outdated Show resolved Hide resolved
Currently the status of a completed or failed batch is held in the
memory, a simple restart will lose the status and the user will not
have any visibility of the job that was long running.

In addition to the metrics, add a new API that reads the batch status
from the drives. A batch job will be cleaned up three days after
completion.

Also add the batch type in the batch id, the reason is that the batch
job request is removed immediately when the job is finished, then we
do not know the type of batch job anymore, hence a difficulty to locate
the job report
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants