Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <f4dece5128fa9b17ce6c75596ea6ab5b@ispras.ru>
Date: Tue, 30 May 2023 15:13:44 +0300
From: Alexey Izbyshev <izbyshev@...ras.ru>
To: musl@...ts.openwall.com
Subject: Issues with conversion state handling in mbsnrtowcs

Hi,

I found the following issues with conversion state handling in 
mbsnrtowcs (also see the attached patches):

1) mbsnrtowcs may modify the internal state of mbrtowc, which is then 
observable by subsequent mbrtowc calls.

2) mbsnrtowcs resets the conversion state even if it stopped due to a 
partial (valid) sequence before converting a single character, making it 
impossible to call it again without saving/restoring mbstate_t manually.

One possible alternative to the attached patch is to change mbsnrtowcs 
to consume the partial sequence instead of rolling back. POSIX says the 
following[1]:

> If the input buffer ends with an incomplete character, it is 
> unspecified whether conversion stops at the end of the previous 
> character (if any), or at the end of the input buffer. In the latter 
> case, a subsequent call to mbsnrtowcs() with an input buffer that 
> starts with the remainder of the incomplete character shall correctly 
> complete the conversion of that character.

And in FUTURE DIRECTIONS:

> A future version may require that when the input buffer ends with an 
> incomplete character, conversion stops at the end of the input buffer.

musl currently does the former, but the latter seems more convenient for 
the caller. For example, it allows easy chunk-by-chunk conversion 
without the need for the caller to handle a partial sequence at the end 
of a chunk manually.

3) mbsnrtowcs updates the conversion state even when called with NULL 
destination buffer.

Here I find it hard to understand what the actual POSIX requirements 
are. The only clue that I found is for mbsrtowcs:

> If conversion stopped due to reaching a terminating null character, and 
> if dst is not a null pointer, the resulting state described shall be 
> the initial conversion state.

This could be understood as implying that the conversion state shouldn't 
be updated if dst is NULL, and this is what musl does for mbsrtowcs. I 
couldn't find additional requirements for mbsnrtowcs.

If that sentence was written with the intent to support the pattern 
"call with dst == NULL; allocate dst; call again", then probably it 
makes sense for both functions to behave in the same way (i.e. not to 
update the state).

But note that this pattern conflicts with "call with dst == NULL on the 
first input chunk, then call with dst == NULL on the second chunk" 
(because the conversion state on the second call could be wrong), which 
could be interpreted as conflicting with POSIX requirement for 
mbsnrtowcs implementations that consume partial sequences at the input 
end (also quoted above):

> In the latter case, a subsequent call to mbsnrtowcs with an input 
> buffer that starts with the remainder of the incomplete character shall 
> correctly complete the conversion of that character.

My patch assumes that this sentence doesn't apply if dst is NULL. (This 
only matters if musl decides to change mbsnrtowcs to consume partial 
sequences.)

Thanks,
Alexey

[1] 
https://pubs.opengroup.org/onlinepubs/9699919799/functions/mbsnrtowcs.html
View attachment "0001-mbsnrtowcs-fix-observable-reuse-of-mbrtowc-s-interna.patch" of type "text/x-diff" (1345 bytes)

View attachment "0002-mbsnrtowcs-fix-wrong-state-rollback-if-no-characters.patch" of type "text/x-diff" (1552 bytes)

View attachment "0003-mbsnrtowcs-don-t-modify-conversion-state-if-dest-buf.patch" of type "text/x-diff" (1502 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.