Subject: Re: [boost] [filesystem] proposal: treat reparse files as regular files
From: Paul Harris (harris.pc_at_[hidden])
Date: 2015-07-28 08:40:53


On 28 July 2015 at 19:07, Andrey Semashev <andrey.semashev_at_[hidden]> wrote:

> On 28.07.2015 04:33, Paul Harris wrote:
>
>> I think we are not on the same page. Let me try and refocus the
>> discussion...
>>
>> With symlinks, there is more than one access point to the same file
>> content. (ie multiple file names to the identical content).
>>
>> That makes symlinks fundamentally different to regular files. And it's why
>> they are treated differently. Eg don't back up content twice.
>>
>> Is that statement correct?
>>
>
> As Niall already commented, that's not correct. What you described is more
> like a hardlink [1].
>
> You can easily spot the difference if you rename or delete the file the
> link points to. The symlink will keep pointing to the old file (thus being
> a dangling symlink) while the hardlink will still be pointing to the file
> content.
>
> A hardlink is actually not any more special than a regular file. Put
> simply, from the filesystem perspective any file is a name pointing to the
> content. When you create a new file, there's only one such name. When you
> create a hardlink, you create another name pointing to the same content and
> increment the reference count to the content. The two names are equivalent,
> and the content exists as long as there are names referencing it.
>
> [1] https://en.wikipedia.org/wiki/Hard_link
>
>
>
I think my point is being missed... I am not debating symlinks or
hardlinks...

I am _happy_ with the way hardlinks and symlinks are treated, in both posix
and windows.

I am _happy_ with the way reparse-based-symlinks and junctions are treated
in windows.

I am _disagree_ with the way dedup'd files are currently treated as a
special file (as if they were a device or a character file or a fifo or a
socket). device/socket/fifos all need to be read in a special way, but
dedup'd files should be read as if they were a plain file.

I _disagree_ that a dedup file should be treated as if they are a symlink.
This is because a dedup file does not point to another file (or inode) on
the file system, which is a characteristic of a symlink or a hardlink. It
is basically just a compressed file. We don't treat NTFS-compressed files
differently from regular files, why are we treating dedup'd files
differently?

Dedup files and symlink files on windows both (unfortunately) use the same
mechanism - reparse points. But we should only treat symlink and junction
reparse point files as symlinks. Anything else should be treated as a
regular file. That is how I am reading the MS docs, and that is how I am
experiencing working with the filesystems.

Simple example is when building a backup program for files
in a _single directory_.

Lets say you want to store every file's content once.
When you find a directory, ignore it.
When you find an "other" file, ignore it (how can you backup a device /
character file / etc?)
When you find a symlink, you want to store just the link.
When you find a regular file, you want to store the contents.
When you find a reparse-point-symlink, you want to store just the link
(like a posix symlink).
When you find a dedup'd file, you want to store the contents (like a posix
regular file).

for (directory_iterator ...)
{
   if (is_symlink(fn)) backup_link(fn);
   if (is_regular_file(fn)) backup_contents(fn);
   if (is_directory(fn)) ignore(fn);
   if (is_other(fn)) ignore(fn);
}

Currently, this pseudo code would fail to backup any automatic dedup'd
files (which are basically any file older than 3 days on some of my sites).
It fails because a dedup'd file is currently an "other".

If you treat a dedup'd file as a symlink, only the "link" will be backed up.
This link points to a magical place that is impossible to read other than
simply reading "fn".

So how does this simple program backup the dedup'd file contents?

cheers,
Paul