Skip to content

Filesystem Case Sensitivity

Simon Frei edited this page Oct 25, 2020 · 8 revisions

A solution has been implemented in [lib: Handle case-insensitive filesystems](https://github.com/syncthing/syncthing/pull/6717] and released in v1.9.0.

For posterity the text below collects/references relevant information relating to the "Filesystem Case Sensitivity Problem"™:

What is the problem?

The issue goes way back and most of it is spelled out in

https://github.com/syncthing/syncthing/issues/1787

It's also a frequent cause for support requests:

https://forum.syncthing.net/search?q=case%20rename

One typical effect is, that if locally a file foo exists and we want to sync Foo from a remote, the following error occurs:

file modified but not rescanned; will try again later

That's because in db we don't have anything for Foo, however when stating Foo on disk it will say Foo exists because it looks at the pre-existing foo. We scan again, see that foo is there as expected, and the same thing happens again when syncing.
A few examples: https://forum.syncthing.net/search?q=case%20sensitive%20modified%20later

Solution attempts/proposals

Some old discussions:

https://forum.syncthing.net/t/case-insensitive-renames-how-to-solve/3073

Be case insensitive

https://github.com/syncthing/syncthing/issues/2739

https://forum.syncthing.net/t/implementing-case-insensitivity/12384

A very short explanation not doing it justice:
In the key-value storage, save files with a canonical, lower-case key, and save the real filename in the value. Thus behave case-insensitively, but case-preserving. Should be configurable, such that unixers can keep the good old simple case-senstitive behaviour.
This affects virtually every stage of Syncthing and no attempt at doing it came to full fruition until now unfortunately.

"Detect real case"-proposal

Brought up in https://github.com/syncthing/syncthing/issues/1787#issuecomment-640621832

Discussion: https://forum.syncthing.net/t/latest-case-insensitivity-proposal/15161

The basic premise is to leave the db alone, and add a way to detect and use the "real case" of a filename on disk. A universal solution is to use listdir/readdirfilenames for path components.

Scanner

Walking a filesystem already does report the real names. Case problems arise because of the second scanner step where deletions are detected by stating filenames as present in the db.

Instead of simple stating, use DirNames to get all existing names. Cache results somehow to prevent tons of DirNames calls for the same path (tree of path components, stack of results, ...).

Unfortunately defective initial attempt:

The following mechanism relies on path ordering, however that's not consistent between walking fs and db. The fs is ordered lexically by path components while the db is ordered lexically by path. E.g. fs goes a -> a/a -> a-a while the db goes a -> a-a -> a/a. And it's not solvable by not going depth first on the fs, as then on fs it would be a -> aaa -> a/a and in db a -> a/a -> aaa.

The proposal here is to avoid that second step by directly comparing db contents while walking the filesystem, i.e. do https://github.com/syncthing/syncthing/pull/4584 again (PR status is merged, but was reverted later).
This means after a scan, all filenames in the db correspond to the real case on disk, regardless of the scenario (e.g. picks up case only renames on windows). This does not require any special treatment for case-insensitive systems and is beneficial to other systems as well, as we will do just one iteration instead of the two now.

Puller

Here we need to get case right between the filename that we currently pull and what's on disk, because we need to be sure that what's on disk is what we expect to be to not lose data.

Open questions

  1. Cross-compatibility/config

Options:

(a) Enable case check everywhere, with an opt-out config (aka user promises that their system is case sensitive).

(b) Detect case sensitivity automatically (i.e. no config) by creating two case-conflicting files and observing the outcome.
Problem: Folder might consist of any weird combination of FSes - what holds at folder root might be moot somewhere else.

(c) Detect the filesystem in use and make conclusions on case-sensitivity from that.
Problem: Rabbithole.

(d) Use the above detection mechanism to choose a default value for the config, but still allow the user to override that.

  1. What to do when detecting case conflicts?

Consensus is for (a).

(a) Make it a sync error, thus the user will have to fix the problem on a case-sensitive peer - just like with invalid filenames on windows currently.

(b) Create "case-conflicts" just like the current concurrent modification conflicts. I.e. with existing foo.txt and to be pulled Foo.txt, you end up with something like `Foo.case-conflict-timestamp-dev.txt".

  1. How to detect correct case?

(a) listdir based methods
Works everywhere, might be inefficient. Efficiency could be improved by caching listdir results.

(b) System specific solutions
windows: FindFirstFile (works). macos: F_GETPATH option for fcntl (untested). Nothing known on linux/unix.

Hopefully (a) with some smartness is fast enough, so we can just use os and not deal with low-level system calls.

Extensions/Questions not relevant to an initial working solution

  1. Should there be special treatment for case-only conflicts?

By "case-only" I mean that the data is the same, but the name differs in case only. In the similar (not same!) situation of version conflicts (concurrent changes) with equal data we resolve it automatically. We could do the same for "case-only" by just choosing one of the two names without user interaction.

Code

Old PR walking the filesystem and db at the same time while scanning, to immediately detect removed items without needing to stat db contents later on.

Utility function and tests to get the real case on windows using FindFirstFile