Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 10 Aug 2023 11:51:15 -0400
From: Rich Felker <dalias@...c.org>
To: Alastair Houghton <ahoughton@...le.com>
Cc: musl@...ts.openwall.com
Subject: Re: setlocale() again

On Thu, Aug 10, 2023 at 04:41:38PM +0100, Alastair Houghton wrote:
> Hi again,
> 
> I spent some time today looking at the setlocale() problem and
> thought I’d put some notes down in an email.
> 
> 1. Musl wishes to support UTF-8 “out of the box”.
> 
> 2. At the same time, it needs to be 8-bit-safe, so the default
> locale, C, is NOT UTF-8.
> 
> 3. POSIX, and the C standard, specify that setlocale() should fail
> if the locale name isn’t a valid locale, but don’t really say what
> that means precisely. A program that wants UTF-8 support and that
> does `setlocale(LC_ALL, “”)` can therefore find itself in the C
> locale if the one specified in the environment happens to be
> invalid.
> 
> 4. This seemed undesirable, so setlocale() presently accepts any
> locale name as valid; if it doesn’t have a definition file for a
> locale, it will copy the C.UTF-8 locale, giving it the name passed
> in and return that. This avoids the problem in (3), and also means
> that gettext() will work for any language without installing locale
> data for Musl. Unfortunately it also means that there is no way for
> a program (notably a test suite) to determine the presence of data
> for a locale, because setlocale() will always succeed, even if we
> don’t have the data.
> 
> 5. Back in 2017 (https://www.openwall.com/lists/musl/2017/11/08/2)
> Rich was proposing to change things so that `setlocale(cat, “”)`
> always succeeds, but if the environment specifies an unknown locale,
> treats it as C.UTF-8, while `setlocale(cat, explicit_name)` will
> fail unless a valid definition file is installed for that locale
> name. This would also avoid the problem in (3), although it will
> mean that gettext() will not work unless a valid locale definition
> is installed for the C library (BTW, this is exactly the situation
> Glibc is in here; if Glibc doesn’t have locale data, it will fail
> setlocale() and then gettext() will find itself in the C locale). On
> the other hand, it does mean that programs can detect whether or not
> a given locale is present.
> 
> Why do I care? Because I’m trying to make libc++ work with Musl and
> right now it has failing tests because it expects (not entirely
> unreasonably) that if e.g. `setlocale(LC_ALL, “fr_FR”)` succeeds,
> then the C library will localise things into French. While I can
> test for the unusual behaviour of Musl detailed in (4), the libc++
> maintainer understandably doesn’t like it and we would both far
> rather Musl were fixed to behave similarly to other implementations.
> 
> It seems to me that Rich’s proposal (5) was sensible. Programs that
> use gettext(), and users relying on it for localization, must
> already cope with the fact that the C library must have locale data
> for their chosen locale in order for gettext() to work; that is how
> things work on Glibc. It so happened that (4) meant that such
> programs would work with partial localization on Musl without there
> being any locale data installed for Musl, but that isn’t really
> right (e.g. you might get a mix of localized strings from gettext()
> but with numeric formatting that didn’t match - for French, for
> instance, numbers would have “.”s instead of “,”s as a decimal
> separator).
> 
> Looking at the 2017 thread, it appears it didn’t go anywhere for
> whatever reason, so I’d like to understand the status of the
> proposed change. Was it nixed for some reason? Is it likely to
> happen in the future? If it’s a matter of resource, if I were to
> raise a patch for it, would it be accepted, in principle?

Thank you for following up on this! The main reason it didn't go
anywhere was lack of feedback/engagement from anyone who cares about
locale behavior. I want whatever steps we take to be informed by what
folks actually need, not just my guesses at that. So in that sense,
your bumping of the issue is helpful in itself!

At this point, it's been quite a while since I looked at the
mechanisms. If you'd like to help move this forward, rather than
starting with a patch, writing a high-level natural language
description of how you'd make the changes (in terms of musl's current
internal representation for locale state) would be the most helpful.
If I'm forgetting and there's already such a good description, just
digging it up and citing it might be fine.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.