|
|
Message-ID: <8760g2utrc.fsf@jnanam.net>
Date: Sun, 11 Jun 2017 14:57:59 -0600
From: Benjamin Slade <slade@...nam.net>
To: Joakim Sindholt <opensource@...sha.com>
Cc: musl@...ts.openwall.com
Subject: Re: ENOSYS/EOPNOTSUPP fallback?
Thank you for the extensive reply.
Just to be clear: I'm just an end-user of flatpak, &c. As far as I can
tell, flatpak is making use of `ostree` which assumes that the libc will
take care of handling `dd` fallback (I got the impression that flatpak
isn't directly calling `fallocate` itself).
Do you think there's an obvious avenue for following up on this?
Admittedly this is an edge-case that won't necessarily affect musl users
on ext4, but it will affect musl users on zfs (and I believe
f2fs). Do you think `ostree` shouldn't rely on the libc for fallback? Or
should ZFS on Linux implement a fallback for fallocate?
--
Benjamin Slade
`(pgp_fp: ,(21BA 2AE1 28F6 DF36 110A 0E9C A320 BBE8 2B52 EE19))
'(sent by mu4e on Emacs running under GNU/Linux . https://gnu.org )
'(Choose Linux, Choose Freedom . https://linux.com )
On 2017-06-05T06:46:33-0600, Joakim Sindholt <opensource@...sha.com> wrote:
> On Sun, Jun 04, 2017 at 09:22:27PM -0600, Benjamin Slade wrote:
> > I ran into what is perhaps a weird edge case. I'm running a system with
> > musl that uses a ZFS root fs. When I was trying to install some
> > flatpaks, I got an `fallocate` failure, with no `dd` fallback. Querying
> > the flatpak team, the fallback to `dd` seems to be something which glibc
> > does (and so the other components assume will be taken care).
> >
> > Here is the exchange regarding this issue:
> > https://github.com/flatpak/flatpak/issues/802
> To quote the glibc source file linked in the bug:
> /* Minimize data transfer for network file systems, by issuing
> single-byte write requests spaced by the file system block size.
> (Most local file systems have fallocate support, so this fallback
> code is not used there.) */
> /* NFS clients do not propagate the block size of the underlying
> storage and may report a much larger value which would still
> leave holes after the loop below, so we cap the increment at
> 4096. */
> /* Write a null byte to every block. This is racy; we currently
> lack a better option. Compare-and-swap against a file mapping
> might address local races, but requires interposition of a signal
> handler to catch SIGBUS. */
> Which leaves 2 massive bugs:
> 1) the leaving of unallocated gaps both because of the NFS thing but
> also because other file systems may work on entirely different
> principles that are not accounted for here and
> 2) overwriting data currently being written to the file as it's being
> forcibly allocated (which might be doing nothing, think deduplication).
> This is not a viable general solution and furthermore fallocate is
> mostly just an optimization hint. If it's a hard requirement of your
> software I would suggest implementing it in your file system. These
> operations can only be safely implemented in the kernel.
> An example:
> MyFS uses write time deduplication on unused blocks (and blocks with all
> zeroes fall under the umbrella of unused). Glibc starts its dance where
> it writes a zero byte to the beginning of each block it perceives and
> for now let's just say it has the right block size. MyFS just trashes
> these writes immediately without touching the disk and updates the size
> metadata which gets lazily written at some point. There's only 400k left
> on the disk and your fallocate of 16G will succeed and run exceptionally
> fast to boot, but it will have allocated nothing and your next write
> fails with ENOSPC.
> Another example:
> myutil has 2 threads running. One thread is constantly writing things to
> a file. The other thread sometimes writes large chunks of data to the
> file and so it hints the kernel to allocate these large chunks by
> calling fallocate, and only then taking the lock(s) held internally to
> synchronize the threads. The first thread finds it needs to update
> something in the section currently being fallocated by glibc's
> algorithm. Suddenly zero bytes appear at 4k intervals for no discernible
> reason, overwriting the data.
> Personally I would look into seeing to it that flatpak only uses
> fallocate as an optimization. The most reliable thing I can think of
> otherwise would be to do the locking necessary (if any) in the program
> and filling the entire target section of the file with data from
> /dev/urandom, but even that may fail spectacularly with transparent
> compression (albeit unlikely).
> Hope this was at least somewhat helpful.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.