|
Message-ID: <20220419170302.GA10621@brightrain.aerifal.cx> Date: Tue, 19 Apr 2022 13:03:02 -0400 From: Rich Felker <dalias@...c.org> To: libc-coord@...ts.openwall.com Subject: Re: stdio_ext.h extensions for gnulib On Fri, Apr 15, 2022 at 05:39:13PM -0700, enh wrote: > On Tue, Apr 12, 2022 at 9:00 PM Rich Felker <dalias@...c.org> wrote: > > > Early on in musl's history, we added a set of further extensions in > > stdio_ext.h: > > > > size_t __freadahead(FILE *); > > const char *__freadptr(FILE *, size_t *); > > void __freadptrinc(FILE *, size_t); > > void __fseterr(FILE *); > > > > The purpose of these functions was to provide a way for gnulib to do > > the things they already insisted on doing, but without having access > > to the FILE representation internals (which is how they implemented > > these things for every other system at the time). > > > > The topic recently came up again via Toybox, where the author is not > > using gnulib but was looking for some of the same functionality. I'm > > told Bionic is on-board with adding these, > > > apparently i shipped __fseterr() years ago (despite the fact that if you'd > asked me last week, i'd have said "we only have the stuff glibc *and* musl > both have"), and this week i was more convinced by __freadahead() than the > other two. > > i was a bit concerned about how well the other two map to all extant stdio > implementations, but it turns out i have a bit of a problem implementing > __freadahead() too [and an existing bug or two i didn't previously know > about]. it looks like musl is quite strict with ungetc() --- there's a > fixed-size always-allocated unget buffer that's just before the actual > buffer and only used if you try to ungetc at the start of the file? the BSD > implementation in bionic instead has an "out of line" unget buffer that > will grow arbitrarily large. which is problematic for messing with the read > pointer unless i've misunderstood what those two functions actually do? i > didn't find any documentation. I don't think __freadptr should imply a contract that the size it reports is the same as the return value of __freadahead; in the case of discontiguous buffers it can't be. In fact a reasonable implementation should be to always return size 1 if the buffer is nonempty, with a pointer to a single char, and size 0 if there's nothing available to read. I agree this is a hideous interface that should not be used. > for me it's a bit problematic for __freadahead() too because it turns out > that POSIX's "The pushed-back bytes shall be returned by subsequent reads > on that stream in the reverse order of their pushing" isn't exactly true > for bionic. i think a subsequent getc() will return these characters, but i > don't think a subsequent fread() would, for example. i've not seen any bug > reports around this, so i guess the default "3 bytes is enough for anyone" > buffer is actually enough in practice (or people who ungetc() only read via > getc()). That sounds like a bug in Bionic. All stdio read operations are required to behave as if by repeated getc. If fread bypasses some of what getc should see, that's a breaking behavior and I'm surprised you haven't hit anything affected (probably because there's not that much plain C software on Android..?) I don't get what you're saying about why it wouldn't be seen with just 3 bytes though. > so i think my choices are either: > > 1. fix the arbitrary unget buffer, and have __freadahead() count characters > in that, and give up on functions that touch the read pointer. > 2. say "well, looks like no-one's using unlimited unget anyway" and use the > musl "8 bytes is enough for anyone ... which means we can just have a > slightly larger single buffer" trick, which also lets us implement the read > pointer functions. 1 byte is supposed to be enough for anyone; ungetc is required to report failure if no more pushback is available. We have 8 because (1) we use the same mechanism for scanf pushback, and there needs to be at least one byte of pushbash available for ungetc after scanf pushback, and (2) we don't have separate wide stdio buffering, but always operate on UTF-8 directly, so one wchar of wscanf pushback and one ungetwc makes up to 8 bytes. However based on the above interpretation of __freadptr, I don't think you have to make any changes to support it. Whether you have to make changes to fix a separate bug, I'm not sure; it sounds like you might. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.