Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140722201008.GC16795@example.net>
Date: Tue, 22 Jul 2014 22:10:08 +0200
From: u-igbb@...ey.se
To: musl@...ts.openwall.com
Subject: Re: Locale bikeshed time

On Tue, Jul 22, 2014 at 02:49:32PM -0400, Rich Felker wrote:
> Overall, my plan at this point is to disallow any absolute/relative
> pathnames in the LC_* vars and restrict them purely to locale names,
> and have the path in a separate variable outside the scope of the
> standard.

+1

> So, the first bikeshed decision to be made is what environment
> variable to use for the locale path, and what fallback should be if
> it's not set. Glibc uses $LOCPATH. On the one hand it would be nice to
> use the same var (since apps are already aware of the need to treat it
> specially), but on the other it's undesirable to have them tied
> together (e.g. if you're using musl as a non-root installation and
> can't write to /usr/lib) and to avoid clashing with glibc's files we

This issue is not crucial for my usage pattern, here it is easy to assign
values of this kind per binary, not per process tree (in contrast to
the locale names which I want to be settable by the user and inheritable
regardless of which library can happen to interpret them).

Speaking more generally, using the same variable as glibc would introduce
a substantial risk of confusion, making the semantics of the variable
context-dependent (i.e. depending on which library a certain binary is
linked to).

This confusion is kind of hidden in monolithic distros where all binaries
are expected to have been built by tightly cooperating parties using the
same libraries - but the general case includes using binaries built
with different premises.

A musl-specific variable name would be a better/cleaner choice.

> would need to choose a subdirectory under $LOCPATH rather than using
> it directly. All of these aspects make it a lot less attractive.

+1

> The second issue is how locale categories are split up. Glibc has each
> category in a separate file, except for the "locale-archive" file
> which stores everything in one file for easy mapping. My leaning so

By the way, please do not follow the way of a single big file.
For systems which rely on file boundaries to reflect data clustering
(i.e. which data is most probable to be used together) it is very useful
to let the files correspond to the data structure. Otherwise some cheap
and efficient distributed data access optimizations become impossible.

Coda file system uses a file as a transmission and caching unit - which is
quite efficient because a file very often corresponds to an "information
unit" which is needed as a whole. Glibc's locale archive enforces a big
wasteful transfer and a large cache footprint for very little actual use.

> far is to put the whole locale -- time format and translations,
> message translations, ... in a single file. This avoids the need for
> multiple mappings (and syscall overhead, and vma overhead, ...) if
> you're using the same value for all categories. But on the other hand,
> if you wanted to have lots of subtle variants of a locale, you might
> end up with largely-duplicate files on disk. Fortunately I think
> they'll all be very small anyway so this may not matter.

I actually do mix categories from different locales.
No problem as long as the files are small.

Rune

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.