Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140724220228.GB4038@brightrain.aerifal.cx>
Date: Thu, 24 Jul 2014 18:02:28 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Locale bikeshed time

On Thu, Jul 24, 2014 at 10:15:48PM +0200, u-igbb@...ey.se wrote:
> On Thu, Jul 24, 2014 at 12:01:50PM -0400, Rich Felker wrote:
> > I just meant that language-based locales should match the pattern:
> > 
> > ^[[:lower:]]{2,3}(_[[:upper:]]{2})?([[:punct:]].*)?$
> > 
> > assuming I didn't make any stupid mistakes in writing that regex. And
> > non-language-based locales should not match this pattern.
> 
> I feel it would be somewhat more robust if we'd have a positive
> definition for "the second class" of locale data, just in case we one
> day discover that we want to differently handle, say, three classes (?)
> 
> A negative defintition gives also very little guidance for the actual
> naming and in the worst case may lead to misunderstanding when multiple
> parties are involved.
> 
> Why not make such a worst case less probable by a somewhat more strict
> naming rule?
> Possibly also defining "non-language-based" in a positive way?
> 
> This is just a thought. I have no actual proposal as I do not have a
> good mental picture of which kinds of "non-language-based" definitions
> exist or should exist and how they are being used or might/should be used.

This is a reasonable sentiment, but do you have a proposal? I think
first you would need an idea of what some "non-language" category
values might be. I can think of some for LC_COLLATE, though I'm not
sure how valuable many of them are:

- UCA default tables
- UTF-16 code unit order
- Case-insensitive Unicode codepoint order

For the other categories, examples seem much harder to find.
LC_MESSAGES is inherently a language-based category, but perhaps you
could have a locale that eliminates verbose natural-language messages
and replaces them with C/POSIX identifiers (e.g. printing ENOENT
instead of "No such file or directory") conveying the meaning. (Or we
could be somewhat radical and replace all the internal strerror
messages like this and require LC_MESSAGES=en to get them back.) I'm
not sure if there would be interesting LC_TIME locales not associated
with a language (since LC_TIME has to offer day/month names). And for
LC_MONETARY, most if not all of the data really corresponds to a
political unit context, not a language, so in principle it might make
sense to have locales just for LC_MONETARY that aren't associated with
a language, but I can't see that being a convenient or reasonable
design in practice...

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.