New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calcSize breaks on sparse files #47783
Comments
@vvoland @cpuguy83 @thaJeztah (I believe these are the relevant mainters?) What would be your opinion on this? Would you think that blocks are a more relevant calculation than size? |
Let me /cc @dmcgowan as well, who coincidentally looks to have made the change on the containerd side. One thing I recall (but maybe I recall incorrectly) is that Golang itself had some issues with sparse files when creating tar's (resulting in those consuming their "non-sparse" size), which may be something to take into account. As to size goes in general, we're currently working on integrating the containerd image-store; as part of that we're also re-evaluating "size" shown in various parts of the CLI output (and API responses), as there's multiple parts that will consume size ("content" stored, as well as "snapshots"); which means we may be adding more granular information about size information. |
I think two reasonable solutions could be:
I'd be willing to try to implement either if you think this would be something that you would accept into the codebase. Would you maybe have a preference out of these options @thaJeztah? |
I think this part could be considered. For most current situations, sparse files would be an exceptional case, so for those (I think?) the value would not change. And (IIUC), the current Size shows the "maximum size" for sparse files, and not the size "on disk", correct? (which means that currently an "empty" 0-byte sparse file may be shown as (e.g.) 64GB. I (personally) think that's fine to change as it would be a (currently at least) rare situation, so won't affect most users. Happy to hear input from others though.
This I would prefer to not yet do; we still have to decide how we want to show new sizes; this could be one of them, but once we add a new field, it's complicated to change (once shipped, we should expect that to be permanent). So if we can do If you're interested in contributing (and if there's no strong objections from others), feel free to open a pull request with your proposed changes for |
Agree it makes sense to do 1 since it is clearly expecting a usage calculation by passing in the directory. In general "size" is a tricky topic for images because there are so many different ways to calculate it and usage could vary by storage backend. |
I'd say that's correct. Another thing with that is that things like SizeRw don't change since the actual on-file size did not change.
Well if it is changed from size to blocks it will affect most users, as the fundamental concept will change. Rather than theoretical size of files (like a 5 byte file being 5 bytes), it would show the physical amount of space being taken up on disk (so for a 5 byte file that would be 1 block of storage, which depending on different block sizes may be 1kb, 4kb, etc). Are you still okay then with this change? Because it will change what moby fundamentally considers the size of the file (on my PC a 5 byte file was considered the size of 1 block, which is the amount of size the file physically takes up in my hard drive. This file was previously reported as "5 bytes"). |
Like this is how the unit tests look after a change to counting blocks rather than using direct size, which targets the actual on-disk file usage rather than the theoretical file size. This is why I was originally also suggesting to make it a separate field, as both of these metrics are really useful. The original size to do things like realise that a 5 byte file is actually 5 bytes, and the blocks to find out how much space it is actually consuming on disk.
I do think this is a pretty big disruption. Changing from size to blocks means that the results are different for all cases, not just for sparse files. A normal non-sparse file may be for example 5 bytes as in the unit tests, but may take up 1 block (so like 4kb on a certain filesystem) of physical space on disk. What do you think? Would it be reasonable to make this change for everyone, or should we look at another solution like providing the user with both file size and physical size on disk? |
@thaJeztah what is your opinion on this? |
IMO, we shouldn't change it for the graphdrivers since it could possibly be a breaking behavior. |
@vvoland what would your opinion be then on the block based calculation reporting for all container fs like rootSize and sizerw? I'm mainly just talking about the size reporting in docker inspect --size |
Containerd integration also affects this, as the container rw layer is handled by containerd snapshotters. Can you try if enabling the containerd integration solves the issue for you? |
Description
This may or may not be a bug based on the opinions of the Moby team, as it could be interpreted either way.
Moby uses a function called
calcSize
(moby/pkg/directory/directory_unix.go
Line 13 in 9d07820
This works great, until runtimes start using sparse files. gVisor, a Google security focused runtime for example, uses sparse files as an overlay, which breaks
calcSize
, as it uses stat.Size, which is not useful on sparse files, as you would want to usestat.Blocks
. Containerd earlier made a similar fix because they ran into similar issues: containerd/continuity@bc5e3edThere's already been a discussion about this at gVisor: google/gvisor#10256. My personal opinion is that using stat.Blocks is a much more useful metric than stat.Size, as stat.Blocks does the same thing for the normal use case, but then also properly deals with sparse files.
I'm willing to see if I can resolve this, but I am curious what the Moby team thinks about this and whether this should be switched to stat.Blocks?
Reproduce
docker inspect
producing a 0 or near-zero value no matter how much is written to the RW layer.Expected behavior
SizeRw should continue to work even when using sparse files or the gVisor overlay.
docker version
Client: Docker Engine - Community Version: 26.1.0 API version: 1.45 Go version: go1.21.9 Git commit: 9714adc Built: Mon Apr 22 17:07:06 2024 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 26.1.0 API version: 1.45 (minimum version 1.24) Go version: go1.21.9 Git commit: c8af8eb Built: Mon Apr 22 17:07:06 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.31 GitCommit: e377cd56a71523140ca6ae87e30244719194a521 runc: Version: 1.1.12 GitCommit: v1.1.12-0-g51d5e94 docker-init: Version: 0.19.0 GitCommit: de40ad0
docker info
Additional Info
No response
The text was updated successfully, but these errors were encountered: