Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87zhfg185y.fsf@mid.deneb.enyo.de>
Date: Wed, 25 Dec 2019 21:07:05 +0100
From: Florian Weimer <fw@...eb.enyo.de>
To: JeanHeyd Meneide <phdofthehouse@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: [ Guidance ] Potential New Routines; Requesting Help

* JeanHeyd Meneide:

>      I hope this e-mail finds you doing well this Holiday Season! I am
> interested in developing a few fast routines for text encoding for
> musl after the positive reception of a paper for the C Standard
> related to fast conversion routines:
>
>      https://thephd.github.io/vendor/future_cxx/papers/source/C%20-%20Efficient%20Character%20Conversions.html

I'm somewhat concerned that the C multibyte functions are too broken
to be useful.  There is a at least one widely implemented character
set (Big5 as specified for HTML5) which does not fit the model implied
by the standard.  Big5 does not have shift states, but a C
implementation using UTF-32 for wchar_t has to pretend it has because
correct conversion from Unicode to Big5 needs lookahead and cannot be
performed one character at a time.

This would at least affect the proposed c8rtomb function.

I posted a brief review of the problematic charsets in glibc here:

  <https://sourceware.org/ml/libc-alpha/2019-05/msg00079.html>

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.