Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calcSize breaks on sparse files #47783

Open
LarsSven opened this issue Apr 30, 2024 · 11 comments
Open

calcSize breaks on sparse files #47783

LarsSven opened this issue Apr 30, 2024 · 11 comments
Labels
kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/0-triage

Comments

@LarsSven
Copy link

LarsSven commented Apr 30, 2024

Description

This may or may not be a bug based on the opinions of the Moby team, as it could be interpreted either way.

Moby uses a function called calcSize (

func calcSize(ctx context.Context, dir string) (int64, error) {
), which walks a directory tree and returns its total size in bytes. This is used by functionality such as docker inspect's Size (https://docs.docker.com/reference/cli/docker/inspect/#size).

This works great, until runtimes start using sparse files. gVisor, a Google security focused runtime for example, uses sparse files as an overlay, which breaks calcSize, as it uses stat.Size, which is not useful on sparse files, as you would want to use stat.Blocks. Containerd earlier made a similar fix because they ran into similar issues: containerd/continuity@bc5e3ed

There's already been a discussion about this at gVisor: google/gvisor#10256. My personal opinion is that using stat.Blocks is a much more useful metric than stat.Size, as stat.Blocks does the same thing for the normal use case, but then also properly deals with sparse files.

I'm willing to see if I can resolve this, but I am curious what the Moby team thinks about this and whether this should be switched to stat.Blocks?

Reproduce

  1. Use runsc (gVisor) as a runtime for docker (https://gvisor.dev/docs/user_guide/install/)
  2. Ensure overlay is enabled for gVisor, which is enabled by default (https://gvisor.dev/blog/2023/05/08/rootfs-overlay/)
  3. Observe SizeRw in docker inspect producing a 0 or near-zero value no matter how much is written to the RW layer.

Expected behavior

SizeRw should continue to work even when using sparse files or the gVisor overlay.

docker version

Client: Docker Engine - Community
 Version:           26.1.0
 API version:       1.45
 Go version:        go1.21.9
 Git commit:        9714adc
 Built:             Mon Apr 22 17:07:06 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          26.1.0
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.9
  Git commit:       c8af8eb
  Built:            Mon Apr 22 17:07:06 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.31
  GitCommit:        e377cd56a71523140ca6ae87e30244719194a521
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    26.1.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.26.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 22
  Running: 0
  Paused: 0
  Stopped: 22
 Images: 97
 Server Version: 26.1.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc runsc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e377cd56a71523140ca6ae87e30244719194a521
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.5.0-28-generic
 Operating System: Ubuntu 23.10
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 62.07GiB
 Name: lars
 ID: c014d359-a40c-4e14-a570-bdac303e212d
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

No response

@LarsSven LarsSven added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage labels Apr 30, 2024
@LarsSven
Copy link
Author

@vvoland @cpuguy83 @thaJeztah (I believe these are the relevant mainters?)

What would be your opinion on this? Would you think that blocks are a more relevant calculation than size?

@thaJeztah
Copy link
Member

Let me /cc @dmcgowan as well, who coincidentally looks to have made the change on the containerd side.

One thing I recall (but maybe I recall incorrectly) is that Golang itself had some issues with sparse files when creating tar's (resulting in those consuming their "non-sparse" size), which may be something to take into account.

As to size goes in general, we're currently working on integrating the containerd image-store; as part of that we're also re-evaluating "size" shown in various parts of the CLI output (and API responses), as there's multiple parts that will consume size ("content" stored, as well as "snapshots"); which means we may be adding more granular information about size information.

@LarsSven
Copy link
Author

LarsSven commented May 1, 2024

I think two reasonable solutions could be:

  1. Replace the current stat.Size by stat.Blocks
  2. Introduce stat.Blocks as a new response value (so the size part of docker inspect will now return 3 values)

I'd be willing to try to implement either if you think this would be something that you would accept into the codebase.

Would you maybe have a preference out of these options @thaJeztah?

@thaJeztah
Copy link
Member

thaJeztah commented May 1, 2024

Replace the current stat.Size by stat.Blocks

I think this part could be considered. For most current situations, sparse files would be an exceptional case, so for those (I think?) the value would not change. And (IIUC), the current Size shows the "maximum size" for sparse files, and not the size "on disk", correct? (which means that currently an "empty" 0-byte sparse file may be shown as (e.g.) 64GB.

I (personally) think that's fine to change as it would be a (currently at least) rare situation, so won't affect most users.

Happy to hear input from others though.

Introduce stat.Blocks as a new response value (so the size part of docker inspect will now return 3 values)

This I would prefer to not yet do; we still have to decide how we want to show new sizes; this could be one of them, but once we add a new field, it's complicated to change (once shipped, we should expect that to be permanent).

So if we can do 1. without much disruption, then, I think we can hold-off doing 2. (for now), and keep that for the follow-up improvements.

If you're interested in contributing (and if there's no strong objections from others), feel free to open a pull request with your proposed changes for 1.. There will still be a review on that PR, so if objections arrise (or if there's things that we didn't consider), those can still be discussed during review 😅

@thaJeztah thaJeztah added the kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. label May 1, 2024
@dmcgowan
Copy link
Member

dmcgowan commented May 1, 2024

Agree it makes sense to do 1 since it is clearly expecting a usage calculation by passing in the directory. In general "size" is a tricky topic for images because there are so many different ways to calculate it and usage could vary by storage backend.

@LarsSven
Copy link
Author

LarsSven commented May 4, 2024

I think this part could be considered. For most current situations, sparse files would be an exceptional case, so for those (I think?) the value would not change. And (IIUC), the current Size shows the "maximum size" for sparse files, and not the size "on disk", correct? (which means that currently an "empty" 0-byte sparse file may be shown as (e.g.) 64GB.

I'd say that's correct. Another thing with that is that things like SizeRw don't change since the actual on-file size did not change.

I (personally) think that's fine to change as it would be a (currently at least) rare situation, so won't affect most users.

Well if it is changed from size to blocks it will affect most users, as the fundamental concept will change. Rather than theoretical size of files (like a 5 byte file being 5 bytes), it would show the physical amount of space being taken up on disk (so for a 5 byte file that would be 1 block of storage, which depending on different block sizes may be 1kb, 4kb, etc).

Are you still okay then with this change? Because it will change what moby fundamentally considers the size of the file (on my PC a 5 byte file was considered the size of 1 block, which is the amount of size the file physically takes up in my hard drive. This file was previously reported as "5 bytes").

@LarsSven
Copy link
Author

LarsSven commented May 4, 2024

cuS4WKz

Like this is how the unit tests look after a change to counting blocks rather than using direct size, which targets the actual on-disk file usage rather than the theoretical file size. This is why I was originally also suggesting to make it a separate field, as both of these metrics are really useful. The original size to do things like realise that a 5 byte file is actually 5 bytes, and the blocks to find out how much space it is actually consuming on disk.

So if we can do 1. without much disruption, then, I think we can hold-off doing 2. (for now), and keep that for the follow-up improvements.

I do think this is a pretty big disruption. Changing from size to blocks means that the results are different for all cases, not just for sparse files. A normal non-sparse file may be for example 5 bytes as in the unit tests, but may take up 1 block (so like 4kb on a certain filesystem) of physical space on disk. What do you think? Would it be reasonable to make this change for everyone, or should we look at another solution like providing the user with both file size and physical size on disk?

@LarsSven
Copy link
Author

LarsSven commented May 8, 2024

@thaJeztah what is your opinion on this?

@vvoland
Copy link
Contributor

vvoland commented May 8, 2024

IMO, we shouldn't change it for the graphdrivers since it could possibly be a breaking behavior.
However, I think it should already do the block based calculation with containerd image store integration enabled because we no longer calculate this size ourselves, but rely on the containerd reported snapshot size.

@LarsSven
Copy link
Author

LarsSven commented May 10, 2024

@vvoland what would your opinion be then on the block based calculation reporting for all container fs like rootSize and sizerw?

I'm mainly just talking about the size reporting in docker inspect --size

@vvoland
Copy link
Contributor

vvoland commented May 14, 2024

Containerd integration also affects this, as the container rw layer is handled by containerd snapshotters.
As I mentioned above, personally I wouldn't change anything for the default non-containerd image store enabled docker.

Can you try if enabling the containerd integration solves the issue for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/0-triage
Projects
None yet
Development

No branches or pull requests

4 participants