Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAO6moYuYqnXcNAhLgOtRcexe7VJYSi514smWEQosOMBv=7xjnA@mail.gmail.com>
Date: Mon, 2 Mar 2026 19:57:00 +1100
From: Xan Phung <xan.phung@...il.com>
To: Rich Felker <dalias@...c.org>, Openwall musl <musl@...ts.openwall.com>
Subject: Re: [PATCH v2] wctype: reduce size of iswalpha & iswpunct by 47%

Hi Rich,

Thanks for the update!

I have also now extended my header file generation tool to create the data
for wcwidth as well.

(My existing 'popcount' data format was flexible enough to easily handle
the two bit data values needed by wcwidth - so the separate 1-bit data sets
of 4kb in wide.h & nonspacing.h, plus the hard-coded '-1' control chars
outside code page 0 are replaced by a unified 1.7kb data set containing
encoded 2 bit data values).

My tool ('gen_wcdata') thus replaces the function of 3 tools (I keep these
original tools only for regression testing):
- gen_ctype
- gen_nonspacing
- gen_wide

I have uploaded my 'gen_wcdata' tool on github, as a fork of your repo at:

https://github.com/nglibc/musl-chartable-tools/tree/master/ctype

It also does verification by testing all 3 functions against 20,000
codepoints.  You can run the verification by using 'make test' within the
ctype subdirectory.

I'll also prepare a wcwidth.c patch (for the git.musl-libc.org repo) soon.

P.S. More details about gen_wcdata and test_wcdata are in the ctype/README
file:

The command line syntax below is accepted. The 'a', 'p', 'w' letters
generate iswalpha_table.h, iswpunct_table.h and wcwidth_table.h, and
the upper case letters generate the the _dict.h header files:


    gen_wcdata [-v] a|p|w|A|P|W


Also, if compiled together with the functions source, it will do tests
of these wctype/wchar functions against this repo's Unicode data.
If compiled as just the gen_wcdata.c source file alone, the tool tests
against the compiler's C library equivalent functions.


To test against the compiler C library functions, execute the following
(where 'a' tests iswalpha, 'p' tests iswpunct, 'v' tests wcwidth):


    gen_wcdata -t a|p|v


Although gen_wcdata is a full replacement for gen_ctype, gen_wide and
gen_nonspacing, these tools are retained for regression testing.


A test recipe has been added to Makefile to generate both new and old
header files, and compile test_wcdata with new & old source files.
To run these tests, execute:


    make test


Alternatively, the test_wcdata tool can be run directly using:


    test_wcdata -t a|p|v


or (to test older musl functions) use:


    test_wcdata_v0 -t a|p|v


Best regards
Xan Phung


On Thu, 26 Feb 2026 at 02:06, Rich Felker <dalias@...c.org> wrote:

> On Wed, Feb 18, 2026 at 12:28:33PM +1100, Xan Phung wrote:
> > Hi,
> >
> > I haven't heard back on where V2 of this patch stands - has it been
> > rejected or is it still under evaluation?
>
> It's certainly not rejected. This looks good. I just want to give it
> adequate attention to understand it before merging and be confident
> that it's keeping or improving performance on most/all archs not just
> a couple mainstream ones. When I first saw it I was worried the
> popcount thing would be a problem and only fast on archs with native
> popcounts, but I saw that you're not even using any native popcount
> just portable arithmetic so that seems like a non-issue.
>
> If anyone else has an opportunity to test performance on other archs
> and post results I'd love to see them.
>
> I don't expect to get to looking at this in depth until rolling a
> release, but I am actively trying to do that again now.
>
> Thanks for working on this and for pinging about it again!
>
> Rich
>

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.