Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <C6S810FSG5JT.239HD3YWBGKJ3@mussels>
Date: Sun, 01 Nov 2020 17:48:43 -0300
From: Érico Nogueira <ericonr@...root.org>
To: "Szabolcs Nagy" <nsz@...t70.net>
Cc: <musl@...ts.openwall.com>, "Alexander Vitiuk" <suda@....net>
Subject: Re: swprintf possible bug

On Sun Nov 1, 2020 at 6:40 PM -03, Szabolcs Nagy wrote:
> * Érico Nogueira <ericonr@...root.org> [2020-11-01 17:17:49 -0300]:
> > On Sun Nov 1, 2020 at 6:06 PM -03, Alexander Vitiuk wrote:
> > > It seems, wsprintf() / wprintf() are not working in musl as expected, if
> > > uses with cyrillic:
> > >
> > > C testcase:
> > > #include <wchar.h>
> > > int main() {
> > > wprintf(L"[hello]\n");
> > > wprintf(L"[Привет]\n");
> > > return 0;
> > > }
> > > on x86_64-linux-gnu prints:
> > > [hello]
> > > [Privet]
> > > and on x86_64-linux-musl prints: [hello]
> > > [
> > >
> > > There are other cases described:
> > > https://github.com/emscripten-core/emscripten/issues/11947
> > 
> > For what it's worth, if this is a bug, it would seem to be in how musl
> > decides when to print characters (not the formatting functions
> > themselves), since the below program doesn't print anything:
> > 
> > #include <wchar.h>
> > #include <stdio.h>
> > 
> > int main() {
> >   fputws(L"[Привет Василий]\n", stdout);
> >   // I don't know if I'm accessing a wchar_t appropriately here
> >   fputwc(L"[Привет Василий]\n"[3], stdout);
> >   return 0;
> > }
> > 
> > I tried tracing the execution from fputws, and not printing anything
> > seems to be caused by the return value of wcsrtombs().
>
> these functions return an error code..
>
> in this case they must return -1 and set errno to EILSEQ,
> since the selected multibyte encoding (LC_CTYPE=C) cannot
> represent the printed wide characters.
>
> i think the musl behaviour is correct, you can try adding
> setlocale(LC_CTYPE,"") at the start of main to make it work.

Thanks, that did fix it. For reference:

#include <wchar.h>
#include <stdio.h>
#include <locale.h>

int main() {
  setlocale(LC_CTYPE, "");
  fputws(L"[Привет Василий]\n", stdout);
  fputwc(L"[Привет Василий]\n"[3], stdout);
  return 0;
}

I wonder what glibc's behavior is that it allows this; and how
emscripten folks can work around the musl behavior as well.

Which environment variables could I set to control this, or is that not
possible?

Thanks,
Érico

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.