-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I encountered several errors while uploading to the PIKPAK cloud drive (investigated for over a week, identified the issues, hoping to assist in fixing). #7787
Comments
If you can work out why these problems are happening, preferably with an rclone log showing the problem we can work on fixing them. Ideally we'd have a reliable way of reproducing the problem too. With that most bugs become easy to fix. |
Thank you for your response. I actually wanted to include the log files, but there are a few issues: 1.Due to the large number of uploaded files, the log files are too extensive to include. There are no obvious error messages, and I'm not sure how to extract them. However, it's not a problem. Later, I'll try uploading some unrelated items to see if I can reproduce these errors. Then, I'll include the log files. Please feel free to attend to other matters in the meantime, as I've already written a repair program myself, and it's not urgent for me. |
I generated a thousand test text files using the following Python from faker import Faker
import os
import random
fake = Faker()
folder_path = "test_folder"
os.makedirs(folder_path, exist_ok=True)
num_files = 1000
for i in range(num_files):
file_name = f"file_{i}.txt"
file_path = os.path.join(folder_path, file_name)
with open(file_path, "w") as file:
file.write(fake.text()) I generated log files using the following command. rclone copy --transfers=16 --drive-chunk-size=64M --log-file=rclone_log.txt --log-level=DEBUG ./test_folder p11:/test_folder log files here This time, I found 28 files that couldn't be read properly, along with many additional main files with names ending in (1). Unfortunately, I didn't find the hidden files mentioned before (even after retrying uploads several times, perhaps the sample size is still too small or it may be related to file size). If you're interested, you can also run the Python program I mentioned above to generate more test files. Perhaps you'll encounter issues with the incorrect status or unreadable files. It's possible that fixing the issue with unreadable files could also resolve the status error problem! Here, you can see many files ending with (1), which shouldn't exist. Moreover, some original files were successfully uploaded but still generated another file with (1) at the end. What's even stranger is that many files with (1) at the end are still unreadable corrupted files. In addition to these 28 files that couldn't be read, there are even more files without original files, only derivative files ending in (1), additionally, some files were directly uploaded to the recycle bin. Below are the 28 corrupted files that couldn't be read: /test_folder/file_945.txt If you need more information or assistance, please let me know. Thank you very much for your help. |
To provide more observation data, I re-uploaded three thousand test files. Unfortunately, the status error issue still did not occur. I suspect that the files hidden due to status errors mentioned earlier are likely related to the file size or whether they are media files (because I know some platforms convert media files, and this process may fail, leading to incomplete status). In the past, I often encountered status errors when uploading large files, but this time, despite uploading thousands of small files, I haven't encountered any, which is very abnormal. Rclone log file In total there are: The following files cannot be read The following files are duplicated. The following files were uploaded directly to the trash/bin. If you need more information or assistance, please let me know. Thank you very much for your help. |
I have started looking into this. Thanks for the detailed report. |
Did you successfully handle those invalid files? In my drive, there are two types of invalids:
For unreadable files, it is actually readable after manually untrash(making |
First of all, thank you for your help. Regarding what you mentioned about unreadable files, they are probably the files I mentioned before that were directly uploaded to the trash, rather than being truly unreadable. Truly unreadable files can only be deleted and cannot be repaired! However, often when unreadable files are generated, duplicate files are also generated at the same time (I speculate this may be due to rclone's inability to write during retries). I simply replace the original file with the duplicate file. If there are no duplicate files, I can only re-upload through rclone. Let me summarize the issues I've encountered: 1.Files may be directly uploaded to the trash, as you mentioned, and the only difference between them and normal files is a file parameter difference, trashed:true. 2.Unreadable files, these files can also be filtered by parameters to search for, first by platform: Upload and there is no task_id. Normally, files uploaded through the platform will be marked as Upload and will always have a task_id. 3.Hidden files, can also be found through the file parameter phase: PHASE_TYPE_PENDING. 4.Duplicate files, I directly use the hash code to search, but may also need to compare files, because sometimes there are duplicate files in the same folder with different filenames, which may need to be handled specially. These are roughly the situations I've encountered. If there's anything else you need help with, please let me know. Gratitude. |
Additional notes:
In fact, many of these issues were discovered through comparing file parameters, and I'm not sure of the exact cause of the problems. These are just personal observations, so they may not always be accurate. If there's anything else you need help with, please let me know. |
This attempts to resolve upload conflicts by implementing cancel/cleanup on failed uploads Fixes #7787
Can you please try using v1.67.0-beta.7928.2e582d73f.fix-7787-pikpak-upload-conflict? |
Sure, I'll give it a try later, but it might take some time. Thank you for your help. |
I used the same approach to upload one thousand files for testing before, but shortly after starting, I encountered the error: panic: runtime error: invalid memory address or nil pointer dereference Then it was forcibly terminated. However, when I used rclone version 1.66 or 1.65 with the same command to upload the same files, this error did not occur. Command:
Log file: my operating system is Win10. I haven't tested on other platforms yet, so I'm not sure if the same error will occur. |
Thanks for the test. I will fix it soon. |
Can you try with v1.67.0-beta.7930.223ba8626.fix-7787-pikpak-upload-conflict? We've made some adjustments in this version to make uploads more reliable. Checking upload statusBy implementing Cancel uploadsIf an upload encounters an error, we cancel it to remove residual files (hidden or unreadable). Force sleep and min sleepBased on experiment using 1000 small files as @cj-neo described and following commands to upload and check them, rclone copy ./test test:test --transfers=16 --log-file=test.log --log-level=DEBUG
rclone check ./test test:test --download --log-file=test.log --log-level=DEBUG the number of first three retries of low level retry for 1000 files are
meaning that 5% (~50 of 1000) files require at least three retries which is +150 ms. Therefore, introducing a forced delay (or sleep) after uploading can be a reasonable approach to ensure server-side updates take effect. Note that this doesn't impact total execution time much. Moreover, minimum sleep for pacer is increased from 10 to 100ms to resolve following server error. http2: server sent GOAWAY and closed the connection; LastStreamID=1999, ErrCode=NO_ERROR, debug="" Stop using
|
Thank you for putting in so much effort to test and fix the issues. I've just uploaded a thousand files on my WIN10 system without encountering any errors that occurred before. Over the next couple of days, I'll be testing the upload of more types of files. I'll report back here whether or not any issues are found. |
Hello, Today, I encountered two more issues during testing:
Below are two files that cause problems on my end. Could you please test them? https://drive.google.com/drive/folders/1pvHTbi0uEuI8OmEWvF6TKZtJPG_fTydX?usp=drive_link
Actually, there's more to it. A few days ago, something strange happened. I uploaded a folder named "abc", but instead of just getting one folder named "abc", I ended up with two folders: one named "abc" and another with random characters appended to it. The original contents were duplicated into both of these folders. Initially, when I uploaded the folder, I didn't explicitly indicate that it was a folder, for example: However, I remember encountering similar issues when uploading to Google Drive and OneDrive before, but they were very rare and difficult to reproduce. Just thought I'd mention it since it happened again here. Please take a look first at why some files cannot be uploaded successfully. Thank you very much for your generous assistance. |
Sorry, the second issue I mentioned about the repeated uploading of files and hidden files is the same issue as before. It's due to the insufficiency of my self-developed detection program, which failed to detect it. Because I overlooked the official restriction on reading a maximum of five hundred files or folders at a time, the excess portion was left unprocessed. Please focus on fixing the specific files that cannot be readed. THX |
For testing purposes, I found another video file that cannot be uploaded using rclone, similar to the two image files I mentioned earlier. After uploading, the file cannot be read because I noticed that there was "no upload process" at all, meaning the file was not actually uploaded. This video file is nearly 900MB in size, but the transfer process completes instantly without transferring any data, and there are no error messages. This is very strange. https://drive.google.com/drive/folders/1pvHTbi0uEuI8OmEWvF6TKZtJPG_fTydX?usp=sharing test in win10 C:\Users\NEO\Desktop\test>rclone copy -vv 3.wmv pr9:
2024/05/15 21:41:22 DEBUG : rclone: Version "v1.67.0-beta.7930.223ba8626.fix-7787-pikpak-upload-conflict" starting with parameters ["rclone" "copy" "-vv" "3.wmv" "pr9:"]
2024/05/15 21:41:22 DEBUG : Creating backend with remote "3.wmv"
2024/05/15 21:41:22 DEBUG : Using config file from "C:\\Users\\NEO\\Desktop\\test\\rclone.conf"
2024/05/15 21:41:22 DEBUG : fs cache: adding new entry for parent of "3.wmv", "//?/C:/Users/NEO/Desktop/test"
2024/05/15 21:41:22 DEBUG : Creating backend with remote "pr9:"
2024/05/15 21:41:22 DEBUG : 3.wmv: Need to transfer - File not found at Destination
2024/05/15 21:41:25 DEBUG : 3.wmv: Dst hash empty - aborting Src hash check
2024/05/15 21:41:25 INFO : 3.wmv: Copied (new)
2024/05/15 21:41:25 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Transferred: 1 / 1, 100%
Elapsed time: 3.1s
2024/05/15 21:41:25 DEBUG : 4 go routines active test in ubuntu 22.04 neo@arm:~/test$ ./rclone copy -vv 3.wmv pr9:
2024/05/15 22:05:40 DEBUG : rclone: Version "v1.67.0-beta.7930.223ba8626.fix-7787-pikpak-upload-conflict" starting with parameters ["./rclone" "copy" "-vv" "3.wmv" "pr9:"]
2024/05/15 22:05:40 DEBUG : Creating backend with remote "3.wmv"
2024/05/15 22:05:40 DEBUG : Using config file from "/home/neo/test/rclone.conf"
2024/05/15 22:05:40 DEBUG : fs cache: adding new entry for parent of "3.wmv", "/home/neo/test"
2024/05/15 22:05:40 DEBUG : Creating backend with remote "pr9:"
2024/05/15 22:05:40 DEBUG : 3.wmv: Need to transfer - File not found at Destination
2024/05/15 22:05:42 DEBUG : 3.wmv: Dst hash empty - aborting Src hash check
2024/05/15 22:05:42 INFO : 3.wmv: Copied (new)
2024/05/15 22:05:42 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Transferred: 1 / 1, 100%
Elapsed time: 2.4s
2024/05/15 22:05:42 DEBUG : 7 go routines active |
Pikpak skips uploading traffic in two cases:
However, our current implementation uses an incorrect hash, which points to the wrong file. There might be a discrepancy between the hash and the referenced file. See #7838 Let's revisit "no upload process" problem once the hash issue is resolved. Btw, have you noticed any changes due to increased minimum sleep and a forced sleep? Does it affect you in any way? |
Thank you for your response. I've previously noticed the peculiarity of 0-byte files, so I skip them when checking for "unreadable files." Regarding hash checks, as far as I understand, most cloud storage providers perform checks and remove duplicates after the files are uploaded. This is done to maintain consistency in upload processing and user experience. For example, when uploading via a web browser, the hash code cannot be known in advance, and pikpak also provides a web-based upload method. Therefore, for problematic files, we may need to use different upload methods and conduct more checks. Additionally, the increased minimum sleep and forced sleep have minimal impact on me. Our previous tests involved a large number of small files, but in general, file sizes are not usually so small. Most of the time is spent on file transmission, so the impact will be even less. repeatedly checking for file errors may require more time, and I agree that this investment is worthwhile. I appreciate your continued assistance in modifying the program. Looking forward to your updates. |
By the way, I wanted to report an issue I encountered a few days ago while uploading a large file. The file is 134GB, but I encountered the following error before even reaching 100GB: 2024/05/08 20:56:55 ERROR : /Marvels.Avengers.zip: Failed to copy: failed to upload: MultipartUpload: upload multipart failed
upload id: C278C4F1FAD2486782772961F3CA663A
caused by: TotalPartsExceeded: exceeded total allowed configured MaxUploadParts (10000). Adjust PartSize to fit in this limit It mentions that adjusting the PartSize parameter can allow the upload. Is the corresponding parameter max-upload-parts? However, I couldn't find this parameter in Pikpak. Currently, I only see two specific parameters for cloud storage services: --oos-max-upload-parts and --s3-max-upload-parts, but neither seems applicable to Pikpak. I'm not sure what the maximum file size limit for Pikpak is, but currently, uploading large files to Pikpak using rclone seems to be problematic. If possible, I think rclone could handle large file uploads better. For instance, it could first check if the file size fits within the default PartSize before starting the upload. If it doesn't fit, the upload shouldn't start because uploading large files takes a lot of time. Encountering a PartSize error halfway through, requiring a settings adjustment and re-upload, wastes all the previous time. Additionally, maybe rclone could automatically adjust the PartSize based on the file size, completing the upload without any errors. I think it's challenging for the average user to know how large the file needs to be before adjusting the PartSize for a successful upload, Of course, these are just my humble suggestions and might be a bit demanding. I'm just offering some possible directions for improvement. Thank you again for your help. |
I am aware of this issue. Including user-configurable upload part size, it will be fixed soon. Would you please open a separate issue for this? |
OK #7850 |
After these few days of testing, unfortunately, most of the previously mentioned issues still exist:
These problems all still persist, with the only issue resolved being that files are no longer being directly uploaded to the trash. Additionally, because I need to monitor the progress while uploading, it is difficult to provide DEBUG messages. On the other hand, I also do not want to publicly disclose what I have uploaded. However, I believe that if you also upload a large amount of files of various types and sizes like I do, you will still be able to see these issues. |
Problematic files are still small ones? Or different this time? What are you using for --transfers? What if you reduce the value if it is too high? |
Not all of them are small files. As mentioned before, I tested uploading a thousand small files without any issues, and initially, I thought the problem was resolved. However, I later continued with the original upload tasks. I am transferring a large amount of data previously stored on Google Drive to PIKPAK, and the file sizes and types vary. The problems are exactly the same as at the beginning, including the issue where files cannot be uploaded using RCLONE. There are two differences: 1.Deleting the problematic files and re-uploading them has a high chance of success. As for the current --transfers parameter, I have set it to 8 and the problem still occurs. If set lower, the file transfer speed becomes much slower. However, if needed, I can help test with a lower setting. |
By the way, if you need to test uploading a large amount of data, you can use Google Colab. It doesn't use up your server or VPS bandwidth and is very convenient. The only limitation is that it disconnects after a period of inactivity, but this doesn't significantly affect RCLONE. |
I am currently using a detection program I previously wrote in Python. It logs problematic files and deletes them. After the detection is complete, it calls RCLONE to re-upload those files, then re-runs the detection... repeating this process several times until only the files that cannot be uploaded with RCLONE remain, which are then uploaded manually. I have noticed that issues tend to be continuous. If no errors occur, they tend to stay that way, but once an error appears, it often happens in the same directory, and even several consecutive files can fail. Therefore, I believe these issues are likely due to network or server overload. Since a lot of issues have accumulated, I am concerned about wasting too much of your time. We already know which files have abnormal attributes, I am wondering if, you could consider a temporary workaround. Specifically, we could recheck the files once more before RCLONE finishes the transfer, similar to my initial approach, but handled internally rather than by an external program. Later, when more time is available, we can address the root cause of the problem. Please evaluate this suggestion. |
What is the problem you are having with rclone?
Recently, while using PIKPAK to upload a large amount of data, I discovered several issues (rclone shows 100% upload complete with no error messages):
Because these error will cause the uploaded files to be damaged and data to be lost, please consider repairing them as a priority if possible.
1.Duplicate Files: I found numerous duplicate files within individual directories. For instance, an original file named "file.jpg" might have duplicates named "file(1).jpg", "file(2).jpg", and so on. These duplicate files have identical sizes and almost identical hash codes (due to some being corrupted).
2.Unreadable Files: Some files uploaded cannot be read.
3.Missing Original Files: Occasionally, only derivative files like "file(1).jpg" are present without the original file. When attempting to manually rename these files back to their original names, an error occurs stating that the file already exists!
4.Files Sent to Trash: Some uploaded files end up in the trash.
Moreover, it's highly probable for issues 1, 2, 3, and 4 to coexist within the same folder simultaneously, without rclone displaying any upload errors! However, reproducing the issue is straightforward: by uploading a large number of files (regardless of file size), there's roughly a 1/50 chance of encountering an error!
As I've already uploaded a significant amount of data and cannot wait for rclone to be fixed, I've taken it upon myself to write a repair program in Python (apologies for not being proficient in Go and unable to assist in modifying rclone source code).
Below are detailed observations and some of the source code used in my repair program:
1.Identifying Unreadable Files:
I observed and compared file attributes to distinguish unreadable files from those that can be read. After uploading, files with reading issues have attributes where the "platform" parameter within the "params" section equals "Upload". Normally, there should be a "task_id" parameter, but unreadable files lack this parameter. However, there's one exception: files with a size of zero, even if they are readable, won't have the "task_id" parameter.
2.Identifying Hidden Files:
As some duplicated files couldn't be renamed back to their original names, I suspected that some files might be hidden. Even official software or websites couldn't detect these hidden files; I had to compare file attributes to find them. After completion of some uploads, the processing status of these files would forever remain at "PHASE_TYPE_PENDING," resulting in them becoming hidden and unreadable.
3.Identifying Duplicate Files:
Identifying duplicate files is straightforward; if their hash codes are the same, they are duplicates.
The parts that need to be fixed in rclone aren't complex; it just needs to check the attributes mentioned above during the upload process, identify upload failures (unreadable files), and abnormal statuses (hidden files). Only files with correct attributes should be counted as successfully uploaded; otherwise, retry the upload. As for the remaining issues (where files are duplicated or sent to the trash) , they may be caused by repeated attempts to upload failed files! Additionally, it might be worth considering adding a "dedupe" parameter to handle previously failed uploads (currently ineffective).
Since my repair code is written in Chinese, I won't provide it here as it might appear as gibberish to non-Chinese language systems.
I'm Taiwanese, and English is not my native language. This is my first time report here, so please forgive any mistakes I might make.
I would like to express my gratitude for everyone's assistance. If more information is needed, please let me know.
What is your rclone version (output from
rclone version
)1.66
Which OS you are using and how many bits (e.g. Windows 7, 64 bit)
Ubuntu 22.04, 64 bit
Which cloud storage system are you using? (e.g. Google Drive)
local or google drive upload to pikpak
The command you were trying to run (e.g.
rclone copy /tmp remote:tmp
)rclone copy & sync
A log from the command with the
-vv
flag (e.g. output fromrclone -vv copy /tmp remote:tmp
)no error message
How to use GitHub
The text was updated successfully, but these errors were encountered: