|
Message-ID: <20210315222916.GG32655@brightrain.aerifal.cx> Date: Mon, 15 Mar 2021 18:29:16 -0400 From: Rich Felker <dalias@...c.org> To: Alexander Monakov <amonakov@...ras.ru> Cc: musl@...ts.openwall.com, Dominic Chen <d.c.ddcc@...il.com> Subject: Re: Issue with fread() and unaligned readv() On Tue, Mar 16, 2021 at 01:09:16AM +0300, Alexander Monakov wrote: > On Mon, 15 Mar 2021, Rich Felker wrote: > > > On Mon, Mar 15, 2021 at 05:39:43PM -0400, Dominic Chen wrote: > > > Not sure this counts as a problem in musl or the application, but > > > I've been debugging a return error of EINVAL from `fread(&buf, 8, > > > 16, f)`, where `f = fopen("/proc/self/pagemap", "r")`. Internally, > > > musl converts this into a call to `readv(f->fd, iov, 2)`, where `iov > > > = {{iov_base = buf, iov_len = 127}, {iov_base = f->buf, iov_len = > > > 1024}}`. However, it turns out that the kernel VFS read > > > implementation inside `pagemap_read` checks that both the file > > > position and count are divisible by PM_ENTRY_BYTES (8 on x86_64), > > > otherwise it rejects the read with EINVAL. In comparison, glibc's > > > `_IO_file_xsgetn` does appear to try to maintain read alignment, > > > although I haven't looked at it in detail. > > > > You can't use stdio to read or write special files/devices that depend > > on the reads or writes happening in particular units, because the > > relationship between stdio operations and the underlying > > buffer-fill/flush operations on the underlying fd is unspecified. It's > > really unfortunate that the kernel lies that procfs files are regular > > files but doesn't give them regular-file semantics, but you really > > need to use direct operations on the fd in the units the interface > > requires, rather than stdio, to work with these files. > > Where does iov_len = 127 for the first iov tuple come from, though? > >From fread arguments I'd expect 8 * 16 = 128. > > If musl always does such off-by-one, it is an efficiency issue (forces > a copy with mismatching source/dest alignment). It's necessary to work around a kernel bug, whereby the kernel fails to honor the requirement that a readv of total length n behave identically, except for where the data is stored, as a single read of length n. For vfs backends that don't implement a proper readv operation, the kernel executes readv as a sequence of reads. When this happens, if the amount of data to read is exactly the length of the first iov (the length requested by the application), continuing to the second iov with no more data available will cause the operation to block indefinitely until more data is available. By reducing the length of the first iov (the caller's buffer) by 1, we ensure that at least 1 byte of the second iov (the FILE's buffer) is actually needed to satisfy the caller, and thus that the call will return without blocking as soon as everything the caller requested is available. This exact situation arises all the time with one very common type of file: tty devices. :( Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.