|
Message-ID: <20210524215021.GC2546@brightrain.aerifal.cx> Date: Mon, 24 May 2021 17:50:22 -0400 From: Rich Felker <dalias@...c.org> To: Konstantin Isakov <dragonroot@...il.com> Cc: musl@...ts.openwall.com Subject: Re: [BUG] swprintf() doesn't handle Unicode characters correctly On Mon, May 24, 2021 at 12:39:35AM -0400, Konstantin Isakov wrote: > Hi, > > The following program: > > =================================== > #include <stdio.h> > #include <wchar.h> > > int main() > { > wchar_t buf[ 32 ]; > > swprintf( buf, sizeof( buf ) / sizeof( *buf ), L"ab\u00E1c" ); > > for ( wchar_t * p = buf; *p; ++p ) > printf( "%u\n", ( unsigned ) *p ); > > return 0; > } > =================================== > > With musl 1.2.2 produces the following output: > 97 > 98 > > The expected output is: > 97 > 98 > 225 > 99 > > With musl, only the first two characters ('a' and 'b') are processed, and > the string ends on a Unicode character (U+00E1, which is an 'a' with acute > accent), instead of outputting it and the last character, 'c'. > > Please CC me when replying. Thanks! You need to call setlocale(LC_CTYPE, ""). Otherwise the character \u00e1 is unrepresentable, because POSIX requires the C locale be single-byte and you're in the C locale until you call setlocale, and thus produces an encoding error (EILSEQ). Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.