|
|
Message-ID: <20260430182159.GT1827@brightrain.aerifal.cx> Date: Thu, 30 Apr 2026 14:22:00 -0400 From: Rich Felker <dalias@...c.org> To: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org> Cc: musl@...ts.openwall.com Subject: Re: Updated dumplocale/source format [Re: Selecting locale source format] On Thu, Apr 30, 2026 at 07:54:25PM +0200, Pablo Correa Gomez wrote: > El Mon, 20-04-2026 a las 13:44 +0200, Pablo Correa Gomez escribió: > > > > > I have used the provided dumplocale.c file to transform the current locales > > into > > the new source format. It can all be found > > in https://gitlab.postmarketos.org/postmarketOS/musl-locales Generally the > > whole > > thing was pretty straight-forward, and clearly it now allowed to fix the > > infamous "May" bug: > > https://gitlab.postmarketos.org/postmarketOS/musl-locales/- > > /commit/374ea7d0164efcf1bc1f14701b1333a943837bd7 > > > > Of course, the "May" bug is still present in all translations (but Spanish, my > > native language, that I have manually fixed), and things like the > > differentiation between H_ and H0 are not there either, since they were not > > there in the previous translations. > > > > I will start poking translators about this, to see if we find any issue that > > we > > didn't find earlier. > > > > Best, > > Pablo > > > > We've gotten quite some good feedback from the translators already. So far, > there are 2 questions that have come up as the most salient ones: > > First, documentation on the keys to translate. I found good documentation for > the standard POSIX keys in[1] would be good to know if that's a good > authoritative source. However, I could not find such documentation for the error > codes in the LC_MESSAGES section. For the errno codes, the POSIX text is in XSH 2.3: https://pubs.opengroup.org/onlinepubs/9799919799/functions/V2_chap02.html#tag_16_03 For gai_strerror: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/netdb.h.html The h_errno codes are not documented in POSIX anymore because the interfaces were deprecated and removed something like 2 versions ago. However, their contexts are basically the same as the corresponding EAI_* codes. For regerror codes: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/regex.h.html > Most specifically, a translator had a question > abuot EAI_OVERFLOW, which in English is just "Overflow". Regardless of whether > the English string had to be improved, would be nice to have a good > authorizative source for those. Everything I could find were manual pages for > getnameinfo Yes, EAI_OVERFLOW only occurs for getnameinfo where the caller has passed a buffer too small to fit the reverse-resolved hostname (or the numeric IP literal string). Our current English text "Overflow" is rather unhelpful. We should probably chose something more descriptive to help both users and translators (altho it's really not intended to be an error code exposed to users; the intent is that applications retry with a larger buffer). > Second, I was asked about the first day of the week. It seems that is a glibc > extension that can be queried passig "_NL_TIME_FIRST_WEEKDAY" to nl_langinfo[2]. > Although this is obviously not in POSIX, I really wonder why is it not, when > it's something clearly tied to the system language. Indeed, I dag a bit into it, > only to realize that it works in my system due to the toolkit I used the most > having some ugly hack[3] working around this limitation. I would be pretty > interested if we could make sure that the source and binary format have enough > flexibility to support adding such potential use-case in the future, would POSIX > be updated to support such useful situation. The format is certainly flexible enough to be able to support extensions like this in the future. We'd have to choose a keyword for it in the source file, but numeric value of the symbolic constant _NL_TIME_FIRST_WEEKDAY just becomes the path component in the binary file. Unfortunately, the situation on whether these are intended to be public interfaces even in glibc is kinda unclear. They're prefixed with _ which suggests they're not intended to be public, and I can't find any documentation suggesting they intend you to use them except for the enums in the header. If we can resolve that, getting consensus with other implementations on what's meant to be exposed, I'm not opposed to adding some extensions like this. It would be helpful then to have the assistance of folks with localization experience to evaluate what makes sense to support. But since the binary format is sufficiently flexible to express these if we want to, I don't think we need to treat making any decisions here as in-scope for the current project. Just confirming that we have the flexibility to add things like this as-needed satisfies the goal of having something future-proof. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.