Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 13 Feb 2018 23:33:04 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: stdio review

On Tue, Feb 13, 2018 at 10:49:04PM -0500, Dale Weiler wrote:
> >> fgetpos.c:
> >>     fgetpos:                                                       [bug]
> >>         using *(off_t*) to write _Int64 data into fpos_t structure when
> >>         memcpy should be used, this breaks strict aliasing. Maybe add
> >>         an off_t to the fpos_t union?
> 
> > My leaning would be to add the off_t, but the type might not be
> > exposed and thus we would need to find a matching type that is
> > exposed. memcpy would be the nicest solution, but only if we had a way
> > of allowing the compiler to use builtin memcpy; otherwise it's a
> > gratuitous call.
> 
> Seeing as the type _is_ exposed, adding the off_t to the union is likewise

It's not exposed in plain C profile, only with POSIX or higher feature
test macros. We could just use long long though since the actual type
doesn't need to match; it only needs to be an integer type capable of
holding the range of off_t.

> >         Compound literal table to reference whence as a lookup table
> >         as a single expression.
> 
> > I thought this was cute.
> 
> It's definitely cute, but it does depend on the seek argument being
> one of the macro definitions in the [0, 2] range which I had to check,
> obviously those have no reason to ever be anything but those values;
> ABI and all but it's just additional mental load to ensure they weren't
> hence why I brought it up.

The check right above demonstrates the assumption of those
definitions. I suppose we could use designated initializers to make it
so the order is clear without changing the generated code...

> >> fwrite.c:
> >>     fwrite:                                                   [question]
> >>         Should there be a check for size*nmemb overflow?
> 
> > This is actually a complicated topic. Formally, I think the C standard
> > reads such that it would be valid for size*nmemb to exceed the size of
> > the data object to be written if you somehow know you'll hit a write
> > error before that happens. However real world implementations don't
> > work like that. In particular, the kernel will error out with EFAULT
> > if the buffer length extends past the valid userspace address range,
> > even if the writes would never happen; the only way to avoid this
> > would be to break longer-than-page writes down into separate
> > page-sized writes. So I think for practical purposes, we have to
> > interpret the standard as requiring that size*nmemb actually reflect
> > the size of the object passed in, and in that case, the multiplication
> > necessarily does not overflow. If there's an interpretation from WG14
> > contrary to this, we'll have to revisit it.
> 
> > See also https://sourceware.org/bugzilla/show_bug.cgi?id=19165
> 
> That is an interesting and somewhat odd edge case. Maybe for the time
> being a comment within here w.r.t it maybe needing to be revisited
> wouldn't hurt. In either case it doesn't appear to be harming anything.

Yes, a short comment here (and in fread.c) might be nice.

> >> gets.c:
> >>     gets:                                                     [optimize]
> >>         The length of the string is calculated twice to strip the
> >>         newline character off it. Why not rewrite it as:
> >>         if (ret) { size_t i = strlen(s)-1; if (s[i] == '\n') s[i] = 0; }
> 
> > Seriously, this is gets. It's always unsafe, deprecated, removed from
> > the current C standard. If it's gratuitously slow too, great. :-)
> 
> Yes it's gets, but fixing it for O(n) instead of O(n*2) does make the musl
> static set slightly smaller, also makes programs using it crash twice as
> fast ;-)

:-)

> >> stdio_impl.h:                                                    [style]
> >>     FUNLOCK macro has a trailing else which prompted me to look at every
> >>     single instance of FUNLOCK to make sure all of them contained a
> >>     semicolon. This is just dangerous, why not use the more common
> >>     idiom of do { } while (0).
> 
> > Indeed that should be fine.
> 
> I think it's better understood by most folks as well, glad we're on the same
> page w.r.t this one. At least then you cannot fail to forget the semicolon.

Yes, exactly.

> >> intscan.h:                                                       [style]
> >>     It isn't apparent for why <stdio.h> needs to be included. Should
> >>     just forward declare struct FILE; here instead.
> 
> > That would not work, because it's *not* struct FILE, it's FILE, which
> > happens to be defined as "struct _IO_FILE", but that's an
> > implementation detail. Including <stdio.h> is the clean way to have
> > that.
> 
> I don't understand why you couldn't replicate that behavior. It's what
> stdio.h already _does_ and seeing as the associated translation unit
> already includes stdio.h it seems gratuitously excessive. It's just an
> opaque pointer type being passed, how is a forward declaration
> incorrect. Does C distinguish between "opaque T" and "opaque T"
> with different underlying struct? If so I have many of code that needs
> to be changed on my end.

See 6.2.7 Compatible type and composite type ΒΆ1:

	"Moreover, two structure, union, or enumerated types declared
	in separate translation units are compatible if their tags and
	members satisfy the following requirements: If one is declared
	with a tag, the other shall be declared with the same tag. If
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	both are completed anywhere within their respective
	translation units, then the following additional requirements
	apply: there shall be a one-to-one correspondence between
	their members such that each pair of corresponding members are
	declared with compatible types; if one member of the pair is
	declared with an alignment specifier, the other is declared
	with an equivalent alignment specifier; and if one member of
	the pair is declared with a name, the other is declared with
	the same name. For two structures, corresponding members shall
	be declared in the same order. For two structures or unions,
	corresponding bit-fields shall have the same widths. For two
	enumerations, corresponding members shall have the same
	values."

Without fancy linker-level checks for UB or LTO, you might say the UB
from declarations with incompatible types is purely theoretical, but
the bigger issue is that, when you did include both intscan.h with
"struct FILE *" and stdio_impl.h in the implementation file intscan.c,
the conflicting definitions would be visible and you'd get an error.

Yes of course we could declare it explicitly using "struct _IO_FILE *"
in intscan.h, but that would violate DRY and would break if the tag
were ever changed (e.g. in a future ABI not using glibc compat cruft).
Including a header that defines FILE really is the right way to do
this.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.