Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211006122936.GD2559@brightrain.aerifal.cx>
Date: Wed, 6 Oct 2021 08:29:37 -0400
From: Rich Felker <dalias@...c.org>
To: Pablo Correa Gomez <ablocorrea@...mail.com>
Cc: musl@...ts.openwall.com
Subject: Re: newlocale: Segmentation fault when locale input is NULL

On Wed, Oct 06, 2021 at 11:31:29AM +0200, Pablo Correa Gomez wrote:
> Dear musl maintainers,
> 
> While doing some work in GNOME control center for postmarketos, we
> bumped into a segmentation fault which is also present in GNOME in
> Alpine[1].
> 
> After doing some degugging, I figured out that the reason is that,
> through GNOME desktop[2], there is a call to newlocale, where they end
> up calling it with a NULL argument.
> 
> newlocale(LC_CTYPE, NULL, (locale_t)0);
> 
> In this case, "name" is passed to __get_locale in
> src/locale/newlocale.c:27 and then dereferenced in
> src/locale/locale_map.c:43, causing a segmentation fault.
> 
> In the case of glibc, this is not an issue, as per the documentation[3]
> they consider it an error:
> 
>        EINVAL locale is NULL.
> 
> Unfortunately, this is a difference in the implementation between glibc
> and musl, maybe due to the fact that the standard[4] in not clear in
> this point:
> 
> 
> The newlocale() function may fail if:
> 
> [EINVAL]
>     The locale argument is not a valid string pointer. 

This is specifically documented as a "may fail", not a "shall fail",
i.e. it's not guaranteed to happen. It comes from POSIX, and is an
instance of a weird pattern the committee tried to fix (but missed
some places it seems) where they wrote "may fail"s for conditions that
already have undefined behavior (here, use of an invalid pointer) in
which case EINVAL would already be allowed as a side effect of the UB
without any further specification. (The same thing created a lot of
confusion in the past about use of pthread_t values past the end of
their lifetime.)

> My personal believe is that adding a NULL pointer check in musl is very
> simple and might help not only GNOME desktop, but maybe also other
> projects in the future. This is the reason why I brought the issue here
> first instead of directly patching GNOME desktop. If you believe that
> musl behaviour should remain the way it is, please let me know and I
> will send MRs for upstream and Alpine's GNOME desktop. I am not
> subscribed to the mailing list, so I would appreciate if I am CC'ed in
> any response.

The musl behavior should remain the way it is. My text on the
rationale actually made it into the glibc wiki some years back:

https://sourceware.org/glibc/wiki/Style_and_Conventions#Invalid_pointers

    "The GNU C library considers it a QoI feature not to mask user
    bugs by detecting invalid pointers and returning EINVAL (unless
    the API is standardized and says it does that). If passing a bad
    pointer has undefined behavior, it is far more useful in the long
    run if it crashes quickly rather than diagnosing an error that is
    probably ignored by the flaky caller.

    If you're going to check for NULL pointer arguments where you have
    not entered into a contract to accept and interpret them, do so
    with an assert, not a conditional error return. This way the bugs
    in the caller will be immediately detected and can be fixed, and
    it makes it easy to disable the overhead in production builds. The
    assert can be valuable as code documentation. However, a segfault
    from dereferencing the NULL pointer is just as effective for
    debugging. If you return an error code to a caller which has
    already proven itself buggy, the most likely result is that the
    caller will ignore the error, and bad things will happen much
    later down the line when the original cause of the error has
    become difficult or impossible to track down. Why is it reasonable
    to assume the caller will ignore the error you return? Because the
    caller already ignored the error return of malloc or fopen or some
    other library-specific allocation function which returned NULL to
    indicate an error."

In particular, here it seems to have found a bug -- what could the
application have possibly meant by passing a null pointer there? Did
it actually intend the behavior of ""? Or of "C"? Or if the intent was
to have this mean "don't use a context-local locale", why pass the
pointer to newlocale and process the error (which could include a
number of other errors you'd certainly want to treat differently)
rather than checking for null and taking a different code path before
calling newlocale?

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.