|
Message-ID: <f4dece5128fa9b17ce6c75596ea6ab5b@ispras.ru> Date: Tue, 30 May 2023 15:13:44 +0300 From: Alexey Izbyshev <izbyshev@...ras.ru> To: musl@...ts.openwall.com Subject: Issues with conversion state handling in mbsnrtowcs Hi, I found the following issues with conversion state handling in mbsnrtowcs (also see the attached patches): 1) mbsnrtowcs may modify the internal state of mbrtowc, which is then observable by subsequent mbrtowc calls. 2) mbsnrtowcs resets the conversion state even if it stopped due to a partial (valid) sequence before converting a single character, making it impossible to call it again without saving/restoring mbstate_t manually. One possible alternative to the attached patch is to change mbsnrtowcs to consume the partial sequence instead of rolling back. POSIX says the following[1]: > If the input buffer ends with an incomplete character, it is > unspecified whether conversion stops at the end of the previous > character (if any), or at the end of the input buffer. In the latter > case, a subsequent call to mbsnrtowcs() with an input buffer that > starts with the remainder of the incomplete character shall correctly > complete the conversion of that character. And in FUTURE DIRECTIONS: > A future version may require that when the input buffer ends with an > incomplete character, conversion stops at the end of the input buffer. musl currently does the former, but the latter seems more convenient for the caller. For example, it allows easy chunk-by-chunk conversion without the need for the caller to handle a partial sequence at the end of a chunk manually. 3) mbsnrtowcs updates the conversion state even when called with NULL destination buffer. Here I find it hard to understand what the actual POSIX requirements are. The only clue that I found is for mbsrtowcs: > If conversion stopped due to reaching a terminating null character, and > if dst is not a null pointer, the resulting state described shall be > the initial conversion state. This could be understood as implying that the conversion state shouldn't be updated if dst is NULL, and this is what musl does for mbsrtowcs. I couldn't find additional requirements for mbsnrtowcs. If that sentence was written with the intent to support the pattern "call with dst == NULL; allocate dst; call again", then probably it makes sense for both functions to behave in the same way (i.e. not to update the state). But note that this pattern conflicts with "call with dst == NULL on the first input chunk, then call with dst == NULL on the second chunk" (because the conversion state on the second call could be wrong), which could be interpreted as conflicting with POSIX requirement for mbsnrtowcs implementations that consume partial sequences at the input end (also quoted above): > In the latter case, a subsequent call to mbsnrtowcs with an input > buffer that starts with the remainder of the incomplete character shall > correctly complete the conversion of that character. My patch assumes that this sentence doesn't apply if dst is NULL. (This only matters if musl decides to change mbsnrtowcs to consume partial sequences.) Thanks, Alexey [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/mbsnrtowcs.html View attachment "0001-mbsnrtowcs-fix-observable-reuse-of-mbrtowc-s-interna.patch" of type "text/x-diff" (1345 bytes) View attachment "0002-mbsnrtowcs-fix-wrong-state-rollback-if-no-characters.patch" of type "text/x-diff" (1552 bytes) View attachment "0003-mbsnrtowcs-don-t-modify-conversion-state-if-dest-buf.patch" of type "text/x-diff" (1502 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.