Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191025141514.GU16318@brightrain.aerifal.cx>
Date: Fri, 25 Oct 2019 10:15:14 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] Update ctype data to Unicode 12.1.0

On Wed, Oct 23, 2019 at 07:21:35PM +0300, Eleftherios Kritikos wrote:
> Hi all,
> 
> I wanted to mention that I have used the code for `wcwidth`[1] and for
> generating Unicode data tables[2] from musl in the Haskell library
> vty[3] (a ncurses style library).
> 
> Relevant files in the MR:
>  * https://github.com/jtdaugherty/vty/pull/179/files#diff-ab3908e00d1c13397ed03e5c2213ad8bR5
>  * https://github.com/jtdaugherty/vty/pull/179/files#diff-a06fd5aeeca6d7dac0278c2537eb1950R1
>  * https://github.com/jtdaugherty/vty/pull/179/files#diff-86acb7ffecd1a09c5f55892bd0ce13b1R1
>  * https://github.com/jtdaugherty/vty/pull/179/files#diff-dc77683ad25ad6f509fb58a397c93f4aR1
>  * https://github.com/jtdaugherty/vty/pull/179/files#diff-9879d6db96fd29134fc802214163b95aR32
> 
> Thanks Rich Felker and everyone else for all the good work that has
> gone into musl!
> 
> Please let me know if you think attribution was not properly given.
> 
> 1.http://git.musl-libc.org/cgit/musl/tree/src/ctype/wcwidth.c?id=9b2921bea1d5017832e1b45d1fd64220047a9802
> 2.https://github.com/richfelker/musl-chartable-tools/tree/master/ctype
> 3. https://github.com/jtdaugherty/vty

Great! I love seeing code/concepts from musl getting adopted elsewhere
especially in places where the classic solutions were all much larger.

Just a quick update on why I haven't merged this yet: I went to do the
case mappings too, and found that at least one range, I believe the
one that would be CASEMAP(0x1c90,0x1cba,0x10d0), is not representable
in the current code that requires updating by hand (it could be done
on a char-by-char basis but continuing to expand that part makes the
file grow larger and slower very quickly).

So, I'm pulling back up the proposed replacement code from April 2018
that never got finished and merged. The old thread is here:
https://www.openwall.com/lists/musl/2018/04/05/1

It's moderately larger -- ~4.8k instead of ~1.5k for Unicode 10 -- but
O(1) rather than O(n) (n = # of case mappings), about 10x faster, and
programmatically generated from UnicodeData.txt. I'll add the (awful,
ugly, just like everything else in musl-chartable-tools) code for
generating the table to musl-chartable-tools when I merge it so it's
not a black box.

I have it working now, so as long as I don't hit any unexpected
problems testing I'll get this (and your patch, and updating case
mappings to Unicode 12) merged soon.

Thanks again for sending the patch and pinging this.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.