Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140726093805.GS16795@example.net>
Date: Sat, 26 Jul 2014 11:38:05 +0200
From: u-igbb@...ey.se
To: musl@...ts.openwall.com
Subject: Re: Locale bikeshed time

On Sat, Jul 26, 2014 at 04:03:27AM -0400, Rich Felker wrote:
> > So I would say it is indeed stupid to localize data meant for
> > interchange. Nevertheless it may still be meaningful to format numbers
> > for the user's taste when the data presentation is only meant for some
> > kind of a "local" context.
> 
> The problem is that the vast majority of actual printing and parsing
> of floating point numbers is for interchange purposes, not mere visual
> pretty-printing, and the existence of alternate radix characters
> introduces subtle bugs into programs that are not tested in such
> locales. Very few programs or libraries I've seen go to the trouble to
> obtain a usable LC_NUMERIC locale in a portable, thread-safe, and
> library-safe way before calling snprintf or strtod. And lots of broken
> gui libraries set LC_NUMERIC behind the application's back even if the
> application only wanted to set other categories.

Ok, the reality is that locale is not being used in a reasonable way so
we do not have to bother implementing it for proper use.
Instead we are obliged to try to reduce the harm by being non-conforming
in a partially compensating fashion. Sigh.

Well, locale is a mess by design...

> > Is there any evidence that "." is more widely used than "," ?
> 
> Well, 2/3 of the world's population is in India and China and they all
> use ".", so I think that pretty much covers the question of which is
> "more widely used".

Ah indeed. That's a sufficient evidence.

> >  locale is not about
> > representing data for computers, but for humans - and I would love to
> > have a best possible internationally useful locale as the default.
> 
> This goes back to the question about modern versus old tradition.
> Alternate radix points are a cultural convention that's (seemingly,
> hopefully) on the way out due to computers and information
> interchange. Maybe in some sense this is cultural imperialism (or just
> globalization or whatnot) but it's certainly a lot less negative than
> the "everyone should use English" attitude. Nobody's saying "don't use
> your language", just "don't gratuitously break things for a one-pixel
> difference". :-)

:-D

In practice this calls for "eo_ZZ@...imal_dot" - which actually would
make sense.

This reminds me that we have an unset issue of naming the variants. Wonder
which schemes happen to exist, to be standardized (?), to be in use?

Gnu gettext manual states
"
The ‘@variant’ can denote any kind of characteristics that is not
already implied by the language ll and the country CC. It can denote a
particular monetary unit. For example, on glibc systems, ‘de_DE@...o’
denotes the locale that uses the Euro currency, in contrast to the
older locale ‘de_DE’ which implies the use of the currency before
2002. It can also denote a dialect of the language, or the script used
to write text (for example, ‘sr_RS@...in’ uses the Latin script,
whereas ‘sr_RS’ uses the Cyrillic script to write Serbian), or the
orthography rules, or similar.
"

I read this as "there is no structure on variant naming and all kinds
of variations share the same name space". Then it is the hopefully
present comment in the locale definition file which apparently has to
be consulted to know what a certain variant is about.

Fine with me but I would like to see this stated somewhere (instead
of my _guess_ after reading the above documentation - it does _not_
say a word about how one can learn the actual semantics of the variant
aka the intention of the locale submitter).

A straightforward try to learn what a certain installed locale is about,
on a Debian Linux system:

 $ locale -a | grep en
 en_US.utf8
 $ apropos en_US
 en_US: nothing appropriate.
 $

On a RedHat Linux system with "@Everything":

 $ locale -a | grep en
  ... lots of en_SOMETHING including en_US ...
 $ apropos en_US
 strlen_user          (9)  - Get the size of a string in user space
 strnlen_user         (9)  - Get the size of a string in user space
 $

Iow one has nice prerequisites for keeping the messy thing in a messy
state :)

Rune

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.