-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal Server Error and EIO on fuse mount for remote EC shard read failure. #5465
Comments
I loaded the original volume from snapshot before ec and it worked. So it should be a problem during ec encode or distribution. |
I manually rebuilt the shard but the error remains the same. It seems that there is an error in the ec building process, either system consistency is broken or there is a bug in the algorithm. |
Could you please make a copy of both the original and the ec volumes? And if the volume is ECed again, does it have the same problem? |
Thank you, I will check today. |
Yes. It reproduces.
|
Interestingly, it also recovers after re-decoding.
|
I temporarily created a public port share at http://184.105.6.184:8000/1201-full.tar |
The behavior is still the same after regenerating the index with |
could not access http://184.105.6.184:8000/1201-full.tar |
I just tried the link again and I can access it? |
Let me try google drive then.. |
It is tricky to set up credentials on remote server for google drive, so I made a s3 share instead. endpoint: https://s3-haosu.nrp-nautilus.io Could you please try loading the files with something like rclone? Or for public URLs, they are like https://s3-haosu.nrp-nautilus.io/seaweed/bug5465/1201-ec/1201.ecx
|
I have downloaded the file. need some time to debug. |
Bump, how is this going on? |
Describe the bug
When a EC shard is compromised on disk, the servers did not try to recover from other shards but gives an 500 internal error.
The problem happens 100% with the file id and the shard at
https://s3-haosu.nrp-nautilus.io/ruoxi-bucket/1201.tar
.System Setup
/usr/local/bin/weed server -volume=0 -filer -dir=/weedfs
/usr/local/bin/weed volume -max=400 -dir=/weedfs -mserver=10.8.149.13:9333
on 8 machines different from the masterweed version
: version 8000GB 3.63 54d7748 linux amd64filer.toml
: nofiler.toml
Expected behavior
When the EC shard fails, the volume server should try to recover the shard from other shards.
The text was updated successfully, but these errors were encountered: