|
Message-ID: <20140629021423.GV179@brightrain.aerifal.cx> Date: Sat, 28 Jun 2014 22:14:23 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: Locale framework RFC On Fri, Jun 27, 2014 at 03:04:12PM -0400, Rich Felker wrote: > Components affected: > [...] > > 3. Stdio wide mode: It's required to bind to the character encoding in > effect at the time the FILE goes into wide mode, rather than at the > time of the IO operation. So rather than using mbrtowc or wcrtomb, it > needs to store the state at the time of enterring wide mode and use a > conversion that's conditional on this saved flag rather than on the > locale. > > 4. Code which uses mbtowc and/or wctomb assuming they always process > UTF-8: Aside from the above-mentioned use in stdio, this is probably > just iconv. To fix this, I propose adding new functions which don't > check the locale but always process UTF-8. These could also be used > for stdio wide mode, and they could use a different API than the > standard functions in order to be more efficient (e.g. returning the > decoded character, or negative for errors, rather than storing the > result via a pointer argument). These two items are turning out to be something of a pain: in particular, the need for non-locale-sensitive UTF-8 encoding and decoding functions. They can be solved by duplicating mbrtowc.c with an identical file except for omitting the locale check that's being added (and likewise wcrtomb.c), but that's rather ugly. Another solution would be to somehow process the first byte in the caller so that the mbstate_t would be non-initial by the time mbrtowc is called. That would force mbrtowc to handle the sequence as UTF-8. But it also spreads out the logic into places I'd rather it not be. Eventually when I do the iconv overhaul, I'd probably like to inline UTF-8 processing anyway and make it a good deal faster, operating on a larger intermediate buffer when possible rather than working character-by-character. However I don't want the current locale work to be dependent on future iconv work for correct behavior, so a decent short-term solution is needed too. And of course the stdio wide functions need a solution. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.