Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220419170302.GA10621@brightrain.aerifal.cx>
Date: Tue, 19 Apr 2022 13:03:02 -0400
From: Rich Felker <dalias@...c.org>
To: libc-coord@...ts.openwall.com
Subject: Re: stdio_ext.h extensions for gnulib

On Fri, Apr 15, 2022 at 05:39:13PM -0700, enh wrote:
> On Tue, Apr 12, 2022 at 9:00 PM Rich Felker <dalias@...c.org> wrote:
> 
> > Early on in musl's history, we added a set of further extensions in
> > stdio_ext.h:
> >
> > size_t __freadahead(FILE *);
> > const char *__freadptr(FILE *, size_t *);
> > void __freadptrinc(FILE *, size_t);
> > void __fseterr(FILE *);
> >
> > The purpose of these functions was to provide a way for gnulib to do
> > the things they already insisted on doing, but without having access
> > to the FILE representation internals (which is how they implemented
> > these things for every other system at the time).
> >
> > The topic recently came up again via Toybox, where the author is not
> > using gnulib but was looking for some of the same functionality. I'm
> > told Bionic is on-board with adding these,
> 
> 
> apparently i shipped __fseterr() years ago (despite the fact that if you'd
> asked me last week, i'd have said "we only have the stuff glibc *and* musl
> both have"), and this week i was more convinced by __freadahead() than the
> other two.
> 
> i was a bit concerned about how well the other two map to all extant stdio
> implementations, but it turns out i have a bit of a problem implementing
> __freadahead() too [and an existing bug or two i didn't previously know
> about]. it looks like musl is quite strict with ungetc() --- there's a
> fixed-size always-allocated unget buffer that's just before the actual
> buffer and only used if you try to ungetc at the start of the file? the BSD
> implementation in bionic instead has an "out of line" unget buffer that
> will grow arbitrarily large. which is problematic for messing with the read
> pointer unless i've misunderstood what those two functions actually do? i
> didn't find any documentation.

I don't think __freadptr should imply a contract that the size it
reports is the same as the return value of __freadahead; in the case
of discontiguous buffers it can't be. In fact a reasonable
implementation should be to always return size 1 if the buffer is
nonempty, with a pointer to a single char, and size 0 if there's
nothing available to read. I agree this is a hideous interface that
should not be used.

> for me it's a bit problematic for __freadahead() too because it turns out
> that POSIX's "The pushed-back bytes shall be returned by subsequent reads
> on that stream in the reverse order of their pushing" isn't exactly true
> for bionic. i think a subsequent getc() will return these characters, but i
> don't think a subsequent fread() would, for example. i've not seen any bug
> reports around this, so i guess the default "3 bytes is enough for anyone"
> buffer is actually enough in practice (or people who ungetc() only read via
> getc()).

That sounds like a bug in Bionic. All stdio read operations are
required to behave as if by repeated getc. If fread bypasses some of
what getc should see, that's a breaking behavior and I'm surprised you
haven't hit anything affected (probably because there's not that much
plain C software on Android..?) I don't get what you're saying about
why it wouldn't be seen with just 3 bytes though.

> so i think my choices are either:
> 
> 1. fix the arbitrary unget buffer, and have __freadahead() count characters
> in that, and give up on functions that touch the read pointer.
> 2. say "well, looks like no-one's using unlimited unget anyway" and use the
> musl "8 bytes is enough for anyone ... which means we can just have a
> slightly larger single buffer" trick, which also lets us implement the read
> pointer functions.

1 byte is supposed to be enough for anyone; ungetc is required to
report failure if no more pushback is available. We have 8 because (1)
we use the same mechanism for scanf pushback, and there needs to be at
least one byte of pushbash available for ungetc after scanf pushback,
and (2) we don't have separate wide stdio buffering, but always
operate on UTF-8 directly, so one wchar of wscanf pushback and one
ungetwc makes up to 8 bytes.

However based on the above interpretation of __freadptr, I don't think
you have to make any changes to support it. Whether you have to make
changes to fix a separate bug, I'm not sure; it sounds like you might.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.