|
Message-ID: <20190911134727.GU9017@brightrain.aerifal.cx> Date: Wed, 11 Sep 2019 09:47:27 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: printf doesn't respect locale On Wed, Sep 11, 2019 at 02:53:36PM +0200, Jens Gustedt wrote: > Hello Rich, > > On Wed, 11 Sep 2019 07:44:37 -0400 Rich Felker <dalias@...c.org> wrote: > > > On Wed, Sep 11, 2019 at 12:07:22PM +0200, Jens Gustedt wrote: > > > > I think that WG14 would be happy to hear any suggestions how we > > > could get out of this trap, a proposal for C2x would even be > > > better. > > > > The obvious solution is a modifier character to printf/scanf format > > strings that applies to numeric conversions and means "always > > format/interpret this as if in the C locale". However this is hard to > > test for at build time unless there's a macro declaring its > > availability, so ideally WG14 would also adopt the sort of > > fine-grained feature availability macros some of us have been > > proposing for extensions. > > If such a proposal would be made, it would have to be based on a > reference implementation in the field. Would musl be willing to be > such a reference implementation? Possibly, contingent on some willingness of other parties to be on board with it (even if not implementing it at first). I don't want musl to be in the position of implementing something new that's not standardized and likely to *conflict* with future standards, which custom format flags could do. > In addition, I would think that it should not switch off all locale > feature but should leave the encoding properties such as UTF-8 > functional. Absolutely, but encoding is not relevant to numeric fields. Everything else is strictly specified, at least for formatting (printf). For conversion (scanf) implementation-defined locale-specific forms are also allowed, but this is probably not wanted when you're processing data from a serialized form that's intended to be universal. > > An alternative/additional solution, which I actually might like > > better, is having a function which sets a thread-local flag to treat > > certain locale properties (at least the problematic LC_NUMERIC ones) > > as if the current locale were "C". This is weaker than the uselocale > > API from POSIX, but doesn't have the problems with the possibility of > > failure (likely with no way to make forward progress) like it does, > > and more importantly, would avoid *breaking* m17n/i18n functionality > > by turning off other unrelated, non-problematic locale features. > > Application or library code could then just set/restore this flag > > around *printf/*scanf/strto*/etc calls, or could set it and leave it > > if they never want to see ',' again. > > Interesting. > > Would this be difficult to implement in musl? (I guess not) I would think not, but I'd have to look at the details a little more. One other advantage of this approach is that it has a more graceful fallback. If an application needs portable LC_NUMERIC behavior, it can check at build time for the presence of the new interface. If present, LC_NUMERIC can be set to "" (user's preference) and the new interface can be used to get the needed behavior. If absent, the application can refrain from setting LC_NUMERIC, only setting the other categories and leaving it as "C" (default). Note that having it be thread-locally stateful is, in my opinion, much better than having new variants of the affected functions or new formats, since a caller using LC_NUMERIC can set/restore the state to safely call library code that's completely unaware of the new interfaces. Of course there may be complications I haven't thought of. One that comes to mind right away is what localeconv() should return under such conditions. > Would you be willing to write this up? What form would it need to be in? > Once we'd have that in musl (even before having it in C2x) it could be > easier for ourselves to convice us to have full locale support. By "full" you mean variable radix point? I'm not sure it makes a big difference in that it won't help code that's not prepared for radix point to vary. What it does help is making it so code that is being careful to avoid the breakage can still use LC_NUMERIC when it wants to, without depending on POSIX. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.