Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zj-xseFnuBhaaa76@album.bayer.uni.cx>
Date: Sat, 11 May 2024 19:58:09 +0200
From: Petr Pisar <petr.pisar@...as.cz>
To: musl@...ts.openwall.com
Subject: nl_langinfo(CODESET) does not match locale

When debugging test failures in libisds on Gentoo with musl
<https://bugs.gentoo.org/show_bug.cgi?id=928107>, I found that
nl_langinfo(CODESET) does not match current locale.

A reproducer:

#include <locale.h>
#include <stdio.h>
#include <langinfo.h>

int main(void) {
    char *old_locale = setlocale(LC_ALL, "cs_CZ.ISO8859-2");
    if (old_locale == NULL) {
        perror("setlocale() set failed");
        return 1;
    }
    old_locale = setlocale(LC_ALL, NULL);
    if (old_locale == NULL) {
        perror("setlocale() query failed");
        return 1;
    }
    printf("Current LC_ALL=%s\n", old_locale);
    printf("CODESET=%s\n", nl_langinfo(CODESET));
    return 0;
}

# gcc test.c && ./a.out
Current LC_ALL=cs_CZ.ISO8859-2
CODESET=UTF-8

While on glibc:

$ gcc test.c && ./a.out
Current LC_ALL=cs_CZ.ISO8859-2
CODESET=ISO-8859-2

I can see that for cs_CZ.UTF8 locale, it nl_langinfo() correctly reports UTF-8,
as well for C reports ASCII. However, for any other character set it always
returns UTF-8.

I found a notice <https://wiki.gentoo.org/wiki/Musl_usage_guide#Locales> that
musl does not implements non-UTF-8 locales. If that is true, then selocale() for
"cs_CZ.ISO8859-2" should fail, instead of accepting the locale.

I observe this behavior with musl-1.2.5.

-- Petr

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.