musl - Re: RE: [EXTERNAL] Re: [PATCH] fix avoidable segfault in catclose

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240127110449.GA1254592@port70.net>
Date: Sat, 27 Jan 2024 12:04:49 +0100
From: Szabolcs Nagy <nsz@...t70.net>
To: Andy Caldwell <andycaldwell@...rosoft.com>
Cc: "musl@...ts.openwall.com" <musl@...ts.openwall.com>
Subject: Re: RE: [EXTERNAL] Re: [PATCH] fix avoidable segfault
 in catclose

* Andy Caldwell <andycaldwell@...rosoft.com> [2024-01-26 20:16:58 +0000]:
> > > > > > > > And it has been musl policy to crash on invalid args since the
> > beginning.
> > > > > > >
> > > > > > > The current implementation doesn't (necessarily) crash/trap on
> > > > > > > an invalid argument, instead it invokes (C-language
> > > > > > > spec-defined) UB itself (it dereferences
> > > > > > > `(uint32_t*)((char*)cat) + 8)`, which, in the case of the `-1`
> > > > > > > handle is the address 0x7, which in turn, not being a valid
> > > > > > > address, is UB to dereference). If you're lucky (or are
> > > > > > > compiling without optimizations/inlining) the compiler will
> > > > > > > emit a MOV that will trigger an access violation and hence a
> > > > > > > SEGV, if
> > > > > >
> > > > > > In general, it's impossible to test for "is this pointer valid?"
> > > > > >
> > > > > > There are certain special cases we could test for, but unless
> > > > > > there is a particularly convincing reason that they could lead
> > > > > > to runaway wrong execution/vulnerabilities prior to naturally
> > > > > > trapping, we have not considered littering the code with these
> > > > > > kinds of checks to be a
> > > > worthwhile trade-off.
> > > > > >
> > > > > > > you're unlucky the compiler will make wild assumptions about
> > > > > > > the value of the variable passed as the arg (and for example
> > > > > > > in your first code snippet, simply delete the `if` statement,
> > > > > > > meaning `use_cat` gets called even when `catopen` fails
> > > > > > > potentially corrupting user data/state).
> > > > > >
> > > > > > I have no idea what you're talking about there. The compiler
> > > > > > cannot make that kind of transformation (lifting code that could
> > > > > > produce undefined behavior, side effects, etc. out of a conditional).
> > > > >
> > > > > It's a hypothetical, but something like the following is valid for
> > > > > the compiler to
> > > > do:
> > > > >
> > > > > * inline the catclose (e.g. in LTO for a static link)
> > > > > * consider the `if` statement and ask "what if `cat` is `-1`
> > > > > * look forward to the pointer dereference (confirming that `cat`
> > > > > can't change in the interim)
> > > > > * realise that `0x7` is not a valid pointer on the target platform
> > > > > so UB is inevitable if `cat` is `-1`
> > > > > * optimize out the comparison since UB frees the compiler of any
> > > > > responsibilities
> > > >
> > > > You have the logic backwards. In the case where cat==(cat_t)-1,
> > > > catclose is not called on the abstract machine, so no conclusions
> > > > can be drawn from anything catclose would do.
> > >
> > > The original code I was working from was:
> > >
> > > ```
> > > nl_catd cat = catopen(...);
> > > if (cat != (nl_catd)-1) {
> > >     use_cat(cat);
> > > }
> > > catclose(cat);
> > > ```
> > >
> > > (i.e. an incorrect use of the APIs, but not UB in a "C99 spec"
> > > sense). In that code the `catclose` call is provably inevitable,
> > > allowing the compiler to infer properties of `cat` from it.
> >
> > Ah, okay, at least now that makes sense. But indeed it is undefined:
> >
> >   "Each of the following statements shall apply to all functions
> >    unless explicitly stated otherwise in the detailed descriptions
> >    that follow:
> >
> >    1. If an argument to a function has an invalid value (such as a
> >       value outside the domain of the function, or a pointer outside
> >       the address space of the program, or a null pointer), the
> >       behavior is undefined.
> >
> >    ..."
> >
> > https://pubs.ope/
> > ngroup.org%2Fonlinepubs%2F9699919799%2Ffunctions%2FV2_chap02.html%23t
> > ag_15_01&data=05%7C02%7Candycaldwell%40microsoft.com%7C7cfdfca2ce32
> > 4149a5cd08dc1ea8f669%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7
> > C638418958276620853%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
> > AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata
> > =0lEpsKknyTB3xULpxaOBZS6UIzI1rCG4D86U9z6P5Xc%3D&reserved=0
> >
> > So I guess what you're saying is that, in the case where an erroneous program like
> > the above has undefined behavior, the compiler could make a transformation such
> > that the effect of the UB is seen at a point different from where it logically occurs.
> > (This is the norm for UB.) In particular, despite cat being -1 from a failed catopen,
> > you might see use_cat being called with a seemingly impossible argument.
> 
> Yes, this - the details aren't particularly interesting but the key is that "invoke UB"
> is not the same as "crash/trap".  I'm also contrasting this to the comments in the
> glibc wiki and Markus's synopsis (from the earlier email) that "it has been musl policy
> to crash on invalid args since the beginning" - in the face of UB, musl (and presumably
> also glibc) _doesn't_ crash/trap, nor does it "fail early and catastrophically" it
> instead "propagates the UB".  In debug builds these are often equivalent, but the
> specific path to UB might not be seen in a debug build, and only be seen in production
> where the non-locality of UB effects are at their worst.

i think you are still looking at this the wrong way:

- the original code has ub.
- so anything can happen.
- whatever libc does, still anything can happen.
- adding a check p==-1 in libc does not change anything.
(the ub already happens in the caller. a compiler can even remove the
call since it can know about catclose semantics.)

given these facts on the theoretical level, we can look pragmatically at
the actual transformations a compiler would likely do and we find that
an invalid NULL+n dereference in practice is almost surely an immediate
crash (on linux with dynamic linking or static linking without lto this
is not only likely but actually guaranteed by existing toolchains) which
is the best possible outcome for debugging, meanwhile an extra check in
libc is worse: the code continues and misbehaves somewhere else.

given that glibc cannot be built with lto, and static linked lto musl
libc.a is not widely deployed since it would only be usable with the
exact same compiler version that built it, i think your objections are
quite theoretical.

> 
> > Exacerbating the degree to which UB can become non-localized is one of the
> > expected effects of LTO, and arguably a good reason not to use LTO for debugging.
> > I don't see a lot of value in trying to prevent this in isolated cases when it's going
> > to happen all over the place anyway for other reasons.
> 
> I think that's reasonable, but it disagrees with the "policy" as described in
> the glibc wiki/your SO post/etc. (maybe I'm being pedantic, but I read "it
> shall fail early and catastrophically" as "it will definitely fail, and in a
> useful way, even in optimized builds" which isn't always the case and often
> can't be).  Maybe I'm simply suggesting that that position needs some
> clarification/caveating?

if the c runtime could give guarantees about ub, we would have memory
safe c implementations. i don't think users expect such guarantees but
i guess one clould add extra wording there to point it out.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.