Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170913181334.GT1627@brightrain.aerifal.cx>
Date: Wed, 13 Sep 2017 14:13:34 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Re: [PATCH] towupper/towlower: Update to Unicode 9.0

On Wed, Sep 13, 2017 at 12:05:19PM +0200, Reini Urban wrote:
> Wait a bit with that. I think I found some more Unicode 9.0 issues with the tables,
> and I’ve found a huge performance opportunity by sorting the 3 tables (mostly pairs), 
> and break the loops earlier.
> This should come close to glibc table performance then, without the huge memory costs they have.
> 
> I’ll write a perl regression testing script not to miss any more mappings, and maybe
> improve the current musl logic. This will need 1-2 days.
> I’ll also use it for cperl then.

Thanks for the update. I still need to publish the table generation
code for all the other tables -- I got it mostly dug up and cleaned up
but got interrupted last time so it's still not posted. With that it
will be possible to update other things too, not just case mappings.

A few of the existing tables are using an older version of the
tabulation code that formats the big arrays differently, so I'll
probably first make a commit to reformat them, so that it's possible
to mechanically check that this commit does not change the generated
.o files, then use the uniform formatting as the basis the subsequent
update to Unicode 9.0. That should not affect the case mapping file
though since it's not machine-generated.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.