Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3446e663-1252-bb02-4248-2132cfc4d086@gmail.com>
Date: Fri, 30 Dec 2016 16:13:44 -0600
From: Laine Gholson <laine.gholson@...il.com>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] bind_textdomain_codeset: don't return failure
 unless encoding isn't UTF-8

option 1 is the only sane choice, and I don't see how something could break unless they constantly check for the GNU behavior and break if it isn't the GNU behavior, in which case it is the program's fault anyways.

On 12/29/16 21:14, Rich Felker wrote:
> On Fri, Dec 16, 2016 at 10:59:54PM -0500, Rich Felker wrote:
>> On Sat, Dec 03, 2016 at 09:04:42PM -0600, Laine Gholson wrote:
>>> returning null broke a vlc media player built with gettext support
>>
>>> >From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001
>>> From: Laine Gholson <laine.gholson@...il.com>
>>> Date: Wed, 9 Nov 2016 20:19:00 -0600
>>> Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8
>>>
>>> VLC isn't happy when bind_textdomain_codeset returns NULL
>>> ---
>>>  src/locale/bind_textdomain_codeset.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c
>>> index 5ebfd5e..e5f3f52 100644
>>> --- a/src/locale/bind_textdomain_codeset.c
>>> +++ b/src/locale/bind_textdomain_codeset.c
>>> @@ -5,7 +5,9 @@
>>>  III
>>>  char *bind_textdomain_codeset(const char *domainname, const char *codeset)
>>>  {
>>> -	if (codeset && strcasecmp(codeset, "UTF-8"))
>>> +	if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) {
>>> +		return "UTF-8";
>>> +	} else if (codeset)
>>>  		errno = EINVAL;
>>>  	return NULL;
>>>  }
>>> --
>>> 2.10.2
>>
>> I think this needs some more thought. The documentation of the API is
>> that a null pointer argument/result means "the locale's character
>> encoding", and that the default is null; presumably even when the
>> locale's codeset is "foo", null (default) and "foo" are still
>> different states.
>>
>> I don't actually like that, and don't think we should copy it --
>> especially since, now that we also have a C locale with "ASCII" as the
>> codeset, we _can't_ provide a codeset matching the locale in all cases
>> -- but I also don't think it's right for the return value (null or
>> "UTF-8") to depend on the argument rather than on the "previous state"
>> like it's documented to.
>>
>> There seem to be two possible reasonable behaviors:
>>
>> 1. Diverge from the GNU behavior and treat textdomains as always-bound
>>    to "UTF-8", regardless of whether bind_textdomain_codeset has been
>>    called. The function would then return a null pointer with EINVAL
>>    set for strings other than "UTF-8"/"UTF8", and would return "UTF-8"
>>    for a valid or null-pointer argument.
>>
>> 2. Keep a 1-bit state for each textdomain reflecting whether its
>>    nominally in "default" mode or "UTF-8" mode. Either way the
>>    original UTF-8 string would be returned; the only point of the
>>    state would be providing a return value for bind_textdomain_codeset
>>    that reflects how it was previously called.
>>
>> Being that 2 is gratuitous complexity to do something stupid and
>> meaningless, I'd lean towards 1, but I don't want to break anything
>> that works. Does this seem safe to do?
>
> Ping. Anyone else have thoughts on this?
>
> Rich
>

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.