Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140725090649.GN16795@example.net>
Date: Fri, 25 Jul 2014 11:06:49 +0200
From: u-igbb@...ey.se
To: musl@...ts.openwall.com
Subject: Re: Locale bikeshed time

On Thu, Jul 24, 2014 at 06:02:28PM -0400, Rich Felker wrote:
> first you would need an idea of what some "non-language" category
> values might be. I can think of some for LC_COLLATE, though I'm not
> sure how valuable many of them are:
> 
> - UCA default tables
> - UTF-16 code unit order
> - Case-insensitive Unicode codepoint order

I can hardly give any opinion on their importance.

> For the other categories, examples seem much harder to find.
> LC_MESSAGES is inherently a language-based category, but perhaps you
> could have a locale that eliminates verbose natural-language messages
> and replaces them with C/POSIX identifiers (e.g. printing ENOENT
> instead of "No such file or directory") conveying the meaning. (Or we
> could be somewhat radical and replace all the internal strerror
> messages like this and require LC_MESSAGES=en to get them back.) I'm

I like this - for clarity, conciseness and for making it as neutral
as possible (ENOENT stems of course from English but no worse than
the keywords of C itself).

> LC_MONETARY, most if not all of the data really corresponds to a
> political unit context, not a language, so in principle it might make
> sense to have locales just for LC_MONETARY that aren't associated with
> a language, but I can't see that being a convenient or reasonable
> design in practice...

Indeed, LC_MONETARY has basically nothing to do with language.

If I might choose I would not let LANG imply LC_MONETARY
(iow would skip LC_MONETARY in language-based locale definitions).

Returning to the naming. As language-based locales are named
after languages, it would be nice to name other kinds of locale
data after their "natural association" too. Then politically-bound
data could be put into the corresponding "territorial" family:

 language                ll[l][_TT]
 territory               TT[_ll[l]]

And if we find something that does not feel reasonable to connect
to either a language or a territory, we can do

 special cases           @<specialcase>

[or                       ZZ@<specialcase>     ("no territory")
 or                       zxx@<specialcase>    ("no language")
 but the shorter and simpler is to prefer]

The expected mode of usage would be like

LANG=de LC_MONETARY=EU
 or
LANG=sv LC_MONETARY=SE
 or
LANG=eo@...8601 LC_MONETARY=US@...4217

which would in every case access two locale data files of different
classes, clearly visible in the naming.

Iso date format actually would be a good candidate for a standalone
"@iso8601", but it can as well live inside the C locale.
Then the last example above might look like

LANG=eo LC_TIME=@...8601 LC_MONETARY=US@...4217
 at the expense of a third file to be accessed
 or rather
LANG=eo LC_TIME=C LC_MONETARY=US@...4217

What do you think about such a naming convention and usage mode?

Rune

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.