Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140727075120.GB16795@example.net>
Date: Sun, 27 Jul 2014 09:51:20 +0200
From: u-igbb@...ey.se
To: musl@...ts.openwall.com
Subject: Re: Locale bikeshed time

On Sat, Jul 26, 2014 at 04:43:29PM -0400, Rich Felker wrote:
> I wasn't quite sure where to inject this reply into the thread, but
> one thing I just remembered is that glibc (and the XSI option for
> POSIX) has [.charset] as part of the standard form for locale names,
> and all of glibc's usable locales end in ".UTF-8". So a user on a
> mixed system is likely to have their locale vars set to include
> ".UTF-8 "at the end, and therefore wouldn't get any localization when
> running musl-linked programs with the locale names we've proposed.

Ah yes this is regrettable. The transition from legacy charsets/encodings
has already happened and even with glibc .UTF-8 is a de-facto default,
thus "shouldn't" have to be indicated.

> The way I see it, we could either have the locale package provide
> symlinks to all of the locales with ".UTF-8" on the end, or musl
> itself could ignore anything starting with the first '.' in a locale
> name. One downside of symlinks is that a locale could uselessly get
> mapped twice if somebody happens to reference it by both names in
> their locale vars. It also puts more of a configuration/complexity
> burden on the installation. But it does keep policy out of libc and
> saves a few bytes of code in libc.

As an integrator I certainly appreciate if I can skip
making zillions of legacy links.
There is also a matter of spelling utf-8 Utf-8 UTF-8 utf8 UTF8 Utf8 utf_8
(did I forget some? :) which different distros/users may choose differently.

Debian Linux:

$ locale -a
C
C.UTF-8         <=====
en_US.utf8      <=====
POSIX
$

Given that the library implies utf-8, please ignore .anything
explicitly - this part of the name is meaningless for musl by design.

A packager can not fully imitate such behaviour even with a lot of links.

The rare cases when the user really means a different charset
but gets utf-8 are better handled by the user if/when encountered.

Rune

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.