Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sun, 17 Dec 2023 23:25:38 +0100
From: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH 0/2] Support printing localized RADIXCHAR

El sab, 16-12-2023 a las 18:10 -0500, Rich Felker escribió:
> On Sat, Dec 16, 2023 at 08:36:42PM +0100, Pablo Correa Gómez wrote:
> > From: Pablo Correa Gómez <ablocorrea@...mail.com>
> > 
> > Since we've been discussing about translations, I've been looking a
> > bit
> > around, and have found some low-hanging fruit, in the form of
> > improving
> > printf-family output for localized systems.
> > 
> > I've tried to do the same for strtof family of functions, but I was
> > not
> > completely sure on how to approach that. Forcing the radix char
> > there
> > has the problem that numeric values as written for programming stop
> > being supported, and treating equally a "." and the localized case
> > seems
> > to not be supported by POSIX. Does anybody have any thoughts about
> > this?
> > Without that, this patch series might be a bit incomplete, since
> > certain localized printf outputs would not be possible to ingest in
> > strtof. Although I'm also unequally unsure if that's a requirement
> > 
> > Pablo Correa Gómez (2):
> >   langinfo: add support for LC_NUMERIC translations
> >   printf: translate RADIXCHAR for floating-point numbers
> > 
> >  src/locale/langinfo.c | 2 +-
> >  src/stdio/vfprintf.c  | 5 +++--
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > --
> > 2.43.0
> 
> This is a topic that's been controversial. I have always been against
> having variable radix character, but I've also been seeking input
> from
> users who want localized output whether the lack of this
> functionality
> is a serious problem that needs revisiting.
> 
> Last time it was discussed, I believe my position was that, if we do
> this, it needs to be a 1-bit setting, where a locale necessarily has
> either '.' or ',' as the radix. No other values actually appear in
> real-world conventions, and on other implementations such as glibc,
> the allowance for arbitrary characters allows doing some ~nasty~
> stuff
> with output and input processing. For example, you could define the
> radix character to be '1' or something that makes conversions fail to
> round-trip.

Makes total sense. I came from the wrong assumption that Spanish might
have use an appostrophe as number separator. But seems like that has
changed since I went to primary school, and certainly the comma is what
I'm used to online in Spanish. All the technical comments you have make
sense, I certainly put this together a bit too fast, but I'm happy that
it spark a discussion on how to do it right.

> 
> As written to support arbitrary radix characters, the patch also
> fails
> to handle the case where the radix character is multi-byte, copying
> only a single byte of it and thereby producing broken output. This is
> actually a nasty case where printf semantics for field width are not
> what the caller is likely to expect, and it breaks our wide printf
> implementation, which assumes when it uses byte-based printf for
> numbers that the byte count and character count are the same.
> Supporting only '.' and ',' avoids all of these issues, too.
> 
> Another detail you've overlooked is that scanf/strto{d,ld,f}/atof
> need
> to process the radix point character. This in turn requires making
> the
> _l wrappers for strto{d,ld,f} so that they actually apply the locale
> argument rather than ignoring it.
> 
> Before proceeding on all of this we should probably try to reach a
> decision on whether it's really needed/wanted functionality.

I really think so. This was indeed a part of Alastair's original
comment on setlocale (https://www.openwall.com/lists/musl/2023/08/10/3)
So it's a thing in Frech, as well as in Spanish, where we have same
problem that Markus mentions in German.

For me personally, I really thing getting these sort of things
functional and well integrated in musl (the way you want to do it), are
pretty important for the postmarketOS project being able to reach a
wider audience :)

So is this convincing enough, that a well-put patch with the changes
you request here and in the other message would make it? If so, I'm
happy to give this a try once the setlocale changes from Alastair get
merged (I already contacted a Polish user from postmarketOS with which
we're going to test a protocol to help users add support to musl-
locales for their language). 

It would certainly be my first time trying to write something this low-
level, so might need some guidance on how to approach the changes like
you've written in your other message.

Best,
Pablo.

> 
> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.