|
Message-ID: <20180123015446.vera7ocpvgaqvkss@sinister.lan.codevat.com> Date: Mon, 22 Jan 2018 17:54:49 -0800 From: Eric Pruitt <eric.pruitt@...il.com> To: musl@...ts.openwall.com Subject: Updating Unicode support NOTE: When I first started writing this email, I didn't realize musl's Unicode property table had recently been updated, but I noticed <https://git.musl-libc.org/cgit/musl/commit/?id=c72c1c5> when I was looking up commit IDs to cite. I'm leaving most of the verbiage below unchanged since I think it adds useful context. The Unicode property data used by musl has not been updated in quite some time, and due to changes introduced in recent publications of the Unicode standard, musl's width data is incorrect for many symbols -- notably emoji. This can lead to rendering glitches in terminals when some applications are not built with musl; for example, my terminal emulator is dynamically linked against a version of GNU libc that supports Unicode 9 (released June 21, 2016) whereas musl's table was lasted updated in 2011 or 2012 (commit 1b0ce9a). To resolve this problem, I wrote a drop-in replacement for musl's wcwidth(3) implementation that uses utf8proc (https://github.com/JuliaLang/utf8proc) as the source of truth. You can find the code for this at <https://github.com/ericpruitt/static-unix-userland/blob/42cbdbb/utf8proc-wcwidth/utf8proc-wcwidth.c>. I am wondering if the musl developers would consider accepting a patch that implements optional / configurable support for utf8proc. The utf8proc-wcwidth.c file I linked to includes some additional code unrelated to musl making it possible to use the file as an LD_PRELOAD library. The LD_PRELOAD stuff would **not** be include in the proposed patch. I'm also investigating implementing the Unicode Collation Algorithm (https://unicode.org/reports/tr10/) for wcscoll(3); would that be of interest? Thanks, Eric
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.