Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171127050934.GX1627@brightrain.aerifal.cx>
Date: Mon, 27 Nov 2017 00:09:34 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Bikeshed invitation for nl_langinfo ambiguities

On Sun, Nov 26, 2017 at 08:57:25PM -0600, A. Wilcox wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> On 26/11/17 19:07, Rich Felker wrote:
> >> I would personally recommend actually using the enum values as
> >> the strings to translate.  _("MON_5"), _("ABMON_5"), etc; this
> >> is non-ambiguous, easily understandable and describable for
> >> translators, and does not require weird hacks at the
> >> implementation or ABI level.
> > 
> > This is certainly one possibility, but it does result in embedding
> > a number of "useless" strings that are never used themselves, only
> > as translation keys, in the binary. One nice property of it
> > (especially if we did the same for strerror keys) is that it
> > eliminates the need for translation files to care about changes in
> > the text in musl.
> 
> That was the idea, yes.  This would work for *all* translatable
> strings, not just the nl_langinfo ones.
> 
> >> Of course, then a "C" / "POSIX" strings file must be present.
> >> But this is, in my opinion, a very small sacrifice to ensure full
> >> purity and ease of translation.
> > 
> > This is of course not acceptable.
> 
> "Of course"?  Why not?  The reason this wouldn't be acceptable is not
> obvious to me.

It basically means there's no such thing as "truely static linked"
binaries anymore, and all programs need to search for and open the "C
locale translation file" at program startup time, resulting a number
of syscalls. None of this would be remotely acceptable to a large
portion of musl's userbase, including myself.

If it's not clear why, just observe that functions like nl_langinfo,
strftime, etc. have no way to fail for arbitrary reasons when the
input is valid. The only way to ensure that they can't fail is to have
the data they need already loaded at the time the locale is set, which
for the default C locale is program start time. Normally/currently
this is achieved just by having the strings in the program text
segment.

> > I have in mind a way we could potentially avoid this: passing keys
> > like "ABMON_5" to __lctrans, and if it returns back the key (which
> > is what happens with the stub implementation or with no translation
> > present), use the builtin C locale strings instead.
> 
> That would work.
> 
> > I don't follow; there are only two duplicate strings and they are 
> > "May" and "%H:%M:%S". The number does not grow with the number of 
> > translations because it's a property of the untranslated strings
> > not the translated ones.
> 
> If you are returning (string+1), then all strings need to have the " "
> at the beginning, or else you are going to return "ai" instead of
> "mai" for French and so on.

I see. I was not intending to include the prefix in the translated
strings, just in the key.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.