|
Message-ID: <20150529013937.GA32093@brightrain.aerifal.cx> Date: Thu, 28 May 2015 21:39:37 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: stdio fixes & internals documentation I'm writing this up because I need to fix a bug in ungetc -- it's failing when the target FILE is at eof rather than clearing the eof status and succeeding, despite having logic in the source that was intended to make this case work right. So I'm going to go ahead and make some tentative documentation of stdio internals for my own use and for others who want to review these changes. This stuff can eventually be cleaned up to go into the musl manual... :-) So far, just for stuff used in reading: f->read() assumes reading is currently a valid operation (so __toread was called successfully without a subsequent __towrite) and the read buffer is empty (but the values of the pointers don't matter). Right now it's responsible for setting EOF and error flags for the file. This is probably a bad idea because the logic, which is stdio-global and not specific to the FILE type, is repeated in different underlying FILE type implementations' read functions. __toread() assumes the FILE is in a valid state to begin reading, but does not misbehave horribly even if it was in the middle of writing. If the FILE's fopen mode does not permit reading, it sets error status and returns EOF. If the file is at EOF, it also returns EOF. This means the caller (such as ungetc) cannot distinguish EOF from "error going into reading mode" without temporarily clearing and restoring the error flag (which would be ugly and inefficient). On failure (either eof status or error switching to read mode) the buffer pointers are left unchanged; on success, they're all set to the beginning of the buffer. The first thing I want to do is fix the known bug in ungetc, and I think the easiest way to do that is to make __toread set valid read buffer pointers when it fails due to eof status. Then, instead of ungetc checking the return value of __toread, it can instead call __toread and then just check rpos. That is, instead of: if ((!f->rend && __toread(f)) || f->rpos <= f->buf - UNGET) { // error it can instead do: if (!f->rend) __toread(f); if (f->rpos <= f->buf - UNGET) { // error or perhaps: if (!f->rpos) __toread(f); if (!f->rpos || f->rpos <= f->buf - UNGET) { // error I like the second version better because it does not assume that a null pointer compares <= any valid pointer, which could be wrong if pointer <= is implemented as a signed comparison. I changed the conditional from !f->rend to !f->rpos to simultaneously optimize two code paths: 1. If f->rpos is initially non-null, the compiler can skip re-checking that it's non-null in the second if. 2. If f->rpos is initially null, then after __toread it needs to be re-read, but only one field (rpos) needs to be loaded from memory instead of two (rend and rpos). Note that if the file is not open in a mode that permits writing, then f->rpos is initially a null pointer and __toread will never set it to something non-null, nor will f->read (since it's not permitted to call f->read without a successful __toread). Once these fixes are taken care of I'd like to look at the EOF logic in f->read() and moving it out to the callers (only __uflow and fread) where we won't have to worry about bugs (which I think exist) in the FILE-type-specific read functions. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.