Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SNSD] A crash __after__ open("xl.meta", O_TRUNC), and __before__ write() would result in lost of metadata and hence the object. #17416

Open
ratnax opened this issue Jun 14, 2023 · 3 comments

Comments

@ratnax
Copy link

ratnax commented Jun 14, 2023

NOTE

A crash in function https://github.com/minio/minio/blob/f32efd5429a62bd1fe8a159e9c75e364c27e0f67/cmd/xl-storage.go#LL1925C1-L1925C29 after open with O_TRUNC, before writing new contents to xl.meta, would result in loss of data.

This inline update is currently being done for DeleteVersion and other operations.

Expected Behavior

On a crash between open("xl.meta", O_TRUNC) and write(), we should not loss the object.

Current Behavior

xl.meta is truncated to 0 bytes post crash after opening it with O_TRUNC.

Possible Solution

Old version of xl.meta is to be replaced atomically with new version prepared in .minio.sys/tmp location.

Steps to Reproduce (for bugs)

Basically, put a break point for write calls to xl.meta in gdb, and kill minio when it hits the breakpoint.

These were the versions originally on a object:

$> mc ls local/bktv/ --insecure --versions
[2023-06-14 03:35:02 EDT] 475KiB STANDARD 63804668-26d5-4ad4-80f1-2de36eb10137 v2 PUT CloudStg
[2023-06-14 03:34:50 EDT] 475KiB STANDARD e3e0dbb5-a0be-4aba-a887-c32433adbc0c v1 PUT CloudStg

Attached gdb to mino, and put a breakpint on 1953.

(gdb) b cmd/xl-storage.go:1953
Breakpoint 6 at 0x339c4ca: file minio-RELEASE.2023-03-13T19-46-17Z/cmd/xl-storage.go, line 1954.
(gdb) c

Done delete versions from a client:
$> mc rm local/bktv/CloudStg --insecure --version-id e3e0dbb5-a0be-4aba-a887-c32433adbc0c

See that we hit a break point for that particular object's xl.meta:

1954            if err != nil {
(gdb) bt
#0  github.com/minio/minio/cmd.(*xlStorage).writeAll (s=0xc003cf2780, ctx=..., volume="bktv",
    path="CloudStg/xl.meta", b=[]uint8 = {...}, sync=true, err=...)
    at /home/xxx/minio-RELEASE.2023-03-13T19-46-17Z/cmd/xl-storage.go:1954
#1  0x000000000339c85f in github.com/minio/minio/cmd.(*xlStorage).WriteAll (s=0xc003cf2780, ctx=...,
    volume="bktv", path="CloudStg/xl.meta", b=[]uint8 = {...}, err=...)
    at /home/xxx/minio-RELEASE.2023-03-13T19-46-17Z/cmd/xl-storage.go:1972
#2  0x000000000338f2df in github.com/minio/minio/cmd.(*xlStorage).deleteVersions (s=0xc003cf2780, ctx=...,
    volume="bktv", path="CloudStg", fis=[]github.com/minio/minio/cmd.FileInfo = {...}, ~r0=...)
    at /home/xxx/minio-RELEASE.2023-03-13T19-46-17Z/cmd/xl-storage.go:1006
#3  0x000000000338f66f in github.com/minio/minio/cmd.(*xlStorage).DeleteVersions (s=0xc003cf2780, ctx=...,
    volume="bktv", versions=[]github.com/minio/minio/cmd.FileInfoVersions = {...}, ~r0=[]error)
    at /home/xxx/minio-RELEASE.2023-03-13T19-46-17Z/cmd/xl-storage.go:1027
#4  0x000000000332cec5 in github.com/minio/minio/cmd.(*xlStorageDiskIDCheck).DeleteVersions (p=0xc00015f680,
    ctx=..., volume="bktv", versions=[]github.com/minio/minio/cmd.FileInfoVersions = {...}, errs=[]error = {...})
    at /home/xxx/minio-RELEASE.2023-03-13T19-46-17Z/cmd/xl-storage-disk-id-check.go:418
#5  0x0000000003017677 in github.com/minio/minio/cmd.erasureObjects.DeleteObjects.func1 (index=0, disk=...)
    at /home/xxx/minio-RELEASE.2023-03-13T19-46-17Z/cmd/erasure-object.go:1437
#6  0x00000000030173f0 in github.com/minio/minio/cmd.erasureObjects.DeleteObjects.func2 ()
--Type <RET> for more, q to quit, c to continue without paging--
   /home//xxx/minio-RELEASE.2023-03-13T19-46-17Z/cmd/erasure-object.go:1452
#7  0x000000000047b961 in runtime.goexit () at /home/xxx/golang/go/src/runtime/asm_amd64.s:1598
#8  0x0000000000000000 in ?? ()
(gdb) q

Killed minio process here, (after Open and before Write):

After the kill xl.meta is now a zero byte file as you can see below.

$>  ls -l dir/bktv/CloudStg/
total 0
drwxrwxr-x. 2 ratna ratna 28 Jun 14 01:35 ee430c88-6d0b-4d17-b170-bfc974c300b1
-rw-rw-r--. 1 ratna ratna  0 Jun 14 01:44 xl.meta

After restarting the minio server, listing versions shows empty output.

$> mc ls local/bktv/CloudStg --insecure --versions
$>

Context

We are losing objects because of a crash unrelated to minio process.

Regression

No, it is not a regression.

Your Environment

  • Version used (minio --version):
  • Server setup and configuration:
  • Operating System and version (uname -a):

setup:

"Single-Node Single-Drive" minio configuration started with "minio server dir"

OS version
4.18.0-425.10.1.el8_7.x86_64

Minio version

minio version DEVELOPMENT.2023-06-13T05-27-54Z (commit-id=e05a1aa08d11b3b17fb2e04351e919b2fdadaab2)
Runtime: go1.20.5 linux/amd64
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Copyright: 2015-2023 MinIO, Inc.
@klauspost
Copy link
Contributor

The window is extremely small. The chance of this happening on several servers at exactly the same time is practically 0, so I presume you are running with no redundancy (single disk/server).

We can take this on as a low-priority task. We can do write+rename, but that will then require some cleanup mechanism - presumably in the scanner.

@klauspost klauspost self-assigned this Jun 14, 2023
@ratnax
Copy link
Author

ratnax commented Jun 14, 2023

Yes, We are running with no redundancy.

Thanks.

@harshavardhana harshavardhana changed the title A crash __after__ open("xl.meta", O_TRUNC), and __before__ write() would result in lost of metadata and hence the object. [SNSD] A crash __after__ open("xl.meta", O_TRUNC), and __before__ write() would result in lost of metadata and hence the object. Jun 25, 2023
@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 15 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 17, 2023
@klauspost klauspost removed the stale label Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants