Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150529135300.GD17573@brightrain.aerifal.cx>
Date: Fri, 29 May 2015 09:53:00 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Making stdio writes robust/recoverable under errors

Currently, musl's stdio write operations take the liberty of leaving
the current position and actual contents written rather unpredictable
if a write error occurs. Originally the motivation was probably a mix
of uncertainty about what to do and lack of code (or a desire to avoid
complexity) for tracking what part of the buffer was unwritten if a
write error occurred before the buffer was written out. But commit
58165923890865a6ac042fafce13f440ee986fd9, as part of adding
cancellation support, added the code to track what part of the buffer
remains unwritten, and therefore, from what I can see, avoiding data
loss when stdio writes fail transiently (like EINTR, for instance) is
simply a matter of removing line 31 in the error path of
__stdio_write:

			f->wpos = f->wbase = f->wend = 0;

so that the buffer is not thrown away.

An attached test program demonstrates the issue. It assumes Linux 64k
pipe buffers so it's not appropriate for inclusion in tests in its
current form, but it gets the job done where it works. It attempts to
send 128k over a pipe via write followed by a series of fwrite retries
that are interrupted by signals that cause EINTR; thanks to the
kernel's pipe buffer semantics, there's basically no timing/race
aspect involved. With current musl, I get only 130560 across the pipe;
with the above line removed, I get all 131072.

I think this is a good change to make from the standpoint of
robustness. Even if most errors are not recoverable, some may be; in
fact ENOSPC could be handled meaningfully if the calling program knows
it has temp files it could delete. What I am concerned about, though,
are residual assumptions that, after f->write, the write buffer will
be clear. With this change, fflush and similar operations would no
longer discard the buffer on error, meaning we would need to check
that callers don't need that property. As another example, fseek fails
when the buffer can't be written out, but since the current buffer
write throws away the buffer on failure, a second fseek call will
succeed; this change would leave it failing. (As an aside, rewind
already fails in this case and throws away the error; this is almost
certainly wrong. It should probably attempt to flush and throw away
the buffer on failure before seeking so that the seek always
succeeds.)

I don't want to make these changes right now at the end of a release
cycle but I might promote them to important targets for the next
cycle.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.