|
Message-ID: <20140723163907.GC11570@brightrain.aerifal.cx> Date: Wed, 23 Jul 2014 12:39:07 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: Locale bikeshed time On Wed, Jul 23, 2014 at 11:50:31AM +0200, u-igbb@...ey.se wrote: > > > I actually do mix categories from different locales. > > > No problem as long as the files are small. > > > > Note that if you're just mixing "ll_TT" and "C", there wouldn't be any > > cost anyway since the C locale (and its aliases) are builtin and never > > loaded from a file. Where I was thinking you might see duplication is > > Sure. This covers certainly most of my preferences but I thought of > LANG=l1_T1 and LC_SOMETHING=l2_T2 [and LC_SOMETHINGELSE=l3_T3]. > This would result in pulling in two or three locale data files but the > overhead is presumably negligible. It's two or three sets of syscalls -- open (one per path component tried until it succeeds), fstat, mmap, close -- rather than one set. And an extra vma (resulting from the mmap) for each used. But the choice isn't whether to have this overhead or not, unless you want to consider the glibc locale-archive ugliness. The choice is just whether to optimize the case where the categories are all the same (only having one set of syscalls in that case) or mostly the same, or not to optimize it and always have multiple sets of syscalls. I believe the latter is strictly worse. > > for things like: LC_ALL=ll_TT@...ifier where modifier is really just > > an alternate for one category (e.g. ISO date format for time, alt > > collation order, etc.), but the file ends up storing duplicates of all > > the data from other categories. However, I think the alternate > > preferred usage here would be to provide a file for just the category > > being overridden that does not contain the base data and require users > > to set the individual categories, like what you're doing, e.g. > > > LANG=ll_TT LC_TIME=ll_TT@...date > > This means that most of the time there will be a single locale file to be > opened, sometimes more, in extreme cases up to the number of categories, > the files also being of different "completeness". This would certainly > contribute to confusion for both the administrators and the users. Hmm. I see how it would be confusing and maybe it's best to discourage this use (incomplete .mo files). But it's purely a useage issue, outside of musl'c control, unless we wanted to impose a check that any locale file have data for all the categories (and I think such a rule would be bad since it precludes having locale files that are unrelated to languages, e.g. a generic "UCA" collation locale with the default UCA data). > For the sake of uniformity I would possibly prefer to see only the > "thinner" files defining exactly one category, instead of different > files having different numbers of included categories. Yes that sounds like a good policy. Really, policy matters like this (i.e. ones that don't affect libc implementation) should be worked out when it comes time to actually make some locales and find a maintainer for a musl-locale repo/package. On that topic, while this is a matter outside my control for individual users, my preference would be that the official musl-locale data attempt to avoid multiple variants/modifiers and legacy options if possible. For example I would like to see the numeric date format be ISO format in all locales, with traditional formats only where the natural-language string representations for months/days are included (and I say this as someone coming from one of the locales, i.e. US, where the traditional numeric date format is non-ISO). In keeping with the principle that musl is "modern" I'd like to prefer modern cultural conventions to historical ones. > But most of all I'd support your approach of including all information in > each file. This is "least confusing" and quite efficient. The overhead > is mostly static storage (not noticeable in our setup and probably not > much anyway :) and the run time overhead affects just the minority of > users who mix locales/categories. (Oh btw as a nice bonus this makes > the file boundaries correspond to the data usage patterns). > > To summarize my view, > > - a file per locale, with all categories included best > - a file per category acceptable > - files with differing data subsets please don't Yes I think this makes sense. My leaning would be to use complete files for language-based locales, and file-per-category for individual category locales that are not associated with any particular language (and where, thereby, there's no assumption that they should provide any behavior to other categories). Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.