|
Message-ID: <87zhfg185y.fsf@mid.deneb.enyo.de> Date: Wed, 25 Dec 2019 21:07:05 +0100 From: Florian Weimer <fw@...eb.enyo.de> To: JeanHeyd Meneide <phdofthehouse@...il.com> Cc: musl@...ts.openwall.com Subject: Re: [ Guidance ] Potential New Routines; Requesting Help * JeanHeyd Meneide: > I hope this e-mail finds you doing well this Holiday Season! I am > interested in developing a few fast routines for text encoding for > musl after the positive reception of a paper for the C Standard > related to fast conversion routines: > > https://thephd.github.io/vendor/future_cxx/papers/source/C%20-%20Efficient%20Character%20Conversions.html I'm somewhat concerned that the C multibyte functions are too broken to be useful. There is a at least one widely implemented character set (Big5 as specified for HTML5) which does not fit the model implied by the standard. Big5 does not have shift states, but a C implementation using UTF-32 for wchar_t has to pretend it has because correct conversion from Unicode to Big5 needs lookahead and cannot be performed one character at a time. This would at least affect the proposed c8rtomb function. I posted a brief review of the problematic charsets in glibc here: <https://sourceware.org/ml/libc-alpha/2019-05/msg00079.html>
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.