Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250405025003.GP1827@brightrain.aerifal.cx>
Date: Fri, 4 Apr 2025 22:50:03 -0400
From: Rich Felker <dalias@...c.org>
To: Kang-Che Sung <explorer09@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: wcrtomb in UTF-8 locale should check the multibyte state

On Sat, Apr 05, 2025 at 07:32:37AM +0800, Kang-Che Sung wrote:
> Hi.
> 
> On Sat, Apr 5, 2025 at 5:39 AM Thorsten Glaser <tg@...lvis.org> wrote:
> >
> > On Sat, 5 Apr 2025, Kang-Che Sung wrote:
> >
> > >Note: It is _allowed_ in the C standard to reuse an mbstate_t object
> > >across different multibyte conversion functions. It is _not an
> >
> > 7.31.6 begs to differ:
> >
> > | If an mbstate_t object has been altered by any of the functions
> > | described in this subclause, and is then used with a different
> > | multibyte character sequence, or in the other conversion direction, or
> > | with a different LC_CTYPE category setting than on earlier function
> > | calls, the behavior is undefined.414)
> >
> 
> I'm aware of that part of the standard paragraph.
> I may have read it wrongly regarding the meaning of the "conversion
> direction", but I still believe that ignoring the mbstate_t object is
> a bad idea.
> 
> I need to make a correction on one thing though:
> In macOS, the wcrtomb call in the example code in my last email
> actually sets errno=EINVAL, not EILSEQ.
> I guess some BSD implementations also follow this (I'm not sure).
> POSIX says "EINVAL: ps points to an object that contains an invalid
> conversion state."

"...the behavior is undefined" means (among other things) there is no
obligation to follow any particular error convention.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.