Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checksum after CopyObject works differently from aws S3 #17013

Open
adam-kiss-sg opened this issue Apr 11, 2023 · 11 comments
Open

Checksum after CopyObject works differently from aws S3 #17013

adam-kiss-sg opened this issue Apr 11, 2023 · 11 comments

Comments

@adam-kiss-sg
Copy link

Expected Behavior

On aws s3, after calling CopyObject, the checksum of the resulting object is the "real" checksum (eg: the sha256 checksum of the full file).

Current Behavior

On minio, after a CopyObject the checksum stays the same. If the original object was uploaded using multipart upload, then the chekcum of the copy will be the combined checksum of the upload parts.

Steps to Reproduce (for bugs)

  1. Create an object with multipart upload, with ChecksumAlgorithm: ChecksumAlgorithm.SHA256.
  2. Copy the object using CopyObject command.
  3. Check the resulting sha256 checksum using HeadObject command with ChecksumMode: ChecksumMode.ENABLED

Context

Please see: https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

Because of how Amazon S3 calculates the checksum for multipart objects, the checksum value for the object might change if you copy it. If you're using an SDK or the REST API and you call CopyObject, Amazon S3 copies any object up to the size limitations of the CopyObject API operation. Amazon S3 does this copy as a single action, regardless of whether the object was uploaded in a single request or as part of a multipart upload. With a copy command, the checksum of the object is a direct checksum of the full object. If the object was originally uploaded using a multipart upload, then the checksum value changes even though the data has not.

Regression

No (at least I don't think so).

Your Environment

Using latest docker image from quay.io/minio/minio

MinIO Object Storage Server
Copyright: 2015-2023 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2023-04-07T05-28-58Z (go1.20.3 linux/amd64)
@klauspost
Copy link
Contributor

I will see if there is a non-messy way to add this.

Seems like we will be forced to add this as a second update to the metadata.

@harshavardhana
Copy link
Member

Just don't set the value in metadata @klauspost it may auto-compute?

@klauspost
Copy link
Contributor

@harshavardhana That is not how it works.

For PutObject and Multipart we can set it before we have received the data since it is always provided. If it mismatches the computed value we can just discard the upload completely.

On CopyObject, we will have to rewrite it after the copy has completed. We don't know it before that.

Also, there is a ton of shortcuts for faster copying and remote copying as well. So there are many paths and it is a very intrusive fix.

@klauspost
Copy link
Contributor

(also copy to same object to update metadata we don't even read the data).

@klauspost
Copy link
Contributor

klauspost commented Apr 11, 2023

https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html#API_CopyObject_ResponseSyntax

The base64-encoded, 32-bit CRC32 checksum of the object. This will only be present if it was uploaded with the object. With multipart uploads, this may not be a checksum value of the object.

That is the API guarantee.

@stale
Copy link

stale bot commented May 22, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 15 days if no further activity occurs. Thank you for your contributions.

@klauspost
Copy link
Contributor

We are rewriting CopyObject to a single part object. With the changes for the trailing headers it shouldn't be too big of a change to add it while streaming.

I will see when I get around to it.

@harshavardhana
Copy link
Member

harshavardhana commented Jun 7, 2023

@klauspost is this fixed already?

@klauspost
Copy link
Contributor

@harshavardhana No.

@harshavardhana
Copy link
Member

just checking did we fix this already?

@klauspost
Copy link
Contributor

@harshavardhana Nope :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants