|
Message-ID: <20180305171027.GE1436@brightrain.aerifal.cx> Date: Mon, 5 Mar 2018 12:10:27 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: Bikeshed invitation for nl_langinfo ambiguities On Sat, Mar 03, 2018 at 12:08:54AM -0500, Rich Felker wrote: > On Sun, Nov 26, 2017 at 05:19:07PM -0600, A. Wilcox wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA256 > > > > On 10/11/17 20:06, Rich Felker wrote: > > > I've found 2 ambiguous-string-to-translate bugs in musl's locale > > > support in nl_langinfo: The pairs ABMON_5 and MON_5 ("May"), and > > > T_FMT and ERA_T_FMT ("%H:%M:%S"), have the same values in the C > > > locale, and thus can't be translated to distinct values like they > > > need to be in other locales. > > > > > > Any opinions on the cleanest way to handle this? There are various > > > hacks I could do at the implementation level, like adding a prefix > > > character to one or the other then applying +1 to the output > > > string, But whatever solution we choose becomes a public interface > > > for translators, so it should be something that's not horribly > > > ugly. > > > > I would personally recommend actually using the enum values as the > > strings to translate. _("MON_5"), _("ABMON_5"), etc; this is > > non-ambiguous, easily understandable and describable for translators, > > and does not require weird hacks at the implementation or ABI level. > > I think this may be the nicest approach, despite being an incompatible > change from the existing system, which apparently doesn't matter and > isn't being used or people would have noticed that "May" can't be > translated right. One really ugly thing here is that the POSIX key for weekdays is "highly unconventional" - ABDAY_1/DAY_1 is Sunday and ABDAY_7/DAY_7 is Saturday. Even the Unicode CLDR noticed this nonsense and used "sun"..."sat" as the keys rather than using numbers so as to be unambiguous. > > Of course, then a "C" / "POSIX" strings file must be present. But > > this is, in my opinion, a very small sacrifice to ensure full purity > > and ease of translation. > > As noted before, obviously this isn't acceptable. We could drop a .mo > file blob in the musl langinfo.c, but I think it might make more sense > to just use different code paths for translated vs nontranslated case. I did some simple estimates with a toy .po/.mo file, and it looks like either of those approaches is going to more-than-double the size of langinfo.o, and make it a lot more complex. Given that "Sun".."Sat" are nicer keys for days anyway, I'm leaning back towards sticking with what we have and just adding a special case for "May". The other ambiguity is one of the ERA_* formats, which we're not even doing right now anyway; they're "not available in the POSIX locale" according to XBD 7.3.5 LC_TIME, so as I read it they should return "" (not the correspondign non-era string) in the C/POSIX locale, and only return something else if they're defined for the locale. Eventually, we should probably look them up with mo keys like "era_d_fmt", etc. but unless/until we properly support them, the lookups for them should just be removed. > Then we could just synthesize the keys (ABMON_*, MON_*, ABDAY_*, > DAY_*) to pass into LCTRANS() rather than having a table of them all > expanded out. I might change my mind when actually working out how the > code would look, though. I started working on a nice means of doing this synthesis - having a table like the existing c_time etc. but contents like: "ABDAY_1\0\0\0\0\0\0\0" "DAY_1\0\0\0\0\0\0\0" "ABMON_1\0\0\0\0\0\0\0\0\0\0\0\0" "MON_1\0\0\0\0\0\0\0\0\0\0\0\0" ... where, when a zero-length entry is hit, the last non-zero-length one seen gets used as a basis for synthesis. But it still didn't seem possible to avoid significant increase in code size and complexity. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.