|
|
Message-ID: <20210524215021.GC2546@brightrain.aerifal.cx>
Date: Mon, 24 May 2021 17:50:22 -0400
From: Rich Felker <dalias@...c.org>
To: Konstantin Isakov <dragonroot@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: [BUG] swprintf() doesn't handle Unicode characters
correctly
On Mon, May 24, 2021 at 12:39:35AM -0400, Konstantin Isakov wrote:
> Hi,
>
> The following program:
>
> ===================================
> #include <stdio.h>
> #include <wchar.h>
>
> int main()
> {
> wchar_t buf[ 32 ];
>
> swprintf( buf, sizeof( buf ) / sizeof( *buf ), L"ab\u00E1c" );
>
> for ( wchar_t * p = buf; *p; ++p )
> printf( "%u\n", ( unsigned ) *p );
>
> return 0;
> }
> ===================================
>
> With musl 1.2.2 produces the following output:
> 97
> 98
>
> The expected output is:
> 97
> 98
> 225
> 99
>
> With musl, only the first two characters ('a' and 'b') are processed, and
> the string ends on a Unicode character (U+00E1, which is an 'a' with acute
> accent), instead of outputting it and the last character, 'c'.
>
> Please CC me when replying. Thanks!
You need to call setlocale(LC_CTYPE, ""). Otherwise the character
\u00e1 is unrepresentable, because POSIX requires the C locale be
single-byte and you're in the C locale until you call setlocale, and
thus produces an encoding error (EILSEQ).
Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.