|
Message-ID: <CABjvSdgaHh619ezm2e0V17xK=-aMtL05xQeLKuqC7oyk0c3WgA@mail.gmail.com> Date: Mon, 27 Dec 2021 23:38:06 +0100 From: Luis Javier Merino <ninjalj@...il.com> To: musl@...ts.openwall.com Subject: Hangul Jamo vowels and trailing consonants should probably be 0 width Hello, I've been looking at widths reported for Hangul Jamo in wcwidth implementations. In glibc and MirBSD xterm, U+1160..U+11FF and U+D7B0..U+D7FF have 0 width. In xterm/ncurses, glib(g_unichar_iszerowidth), and rust's unicode-width U+1160..U+11FF have 0 width. Konsole had U+1160..U+11FF with 0 width until October 2018, but moving from a wcwidth() based on the Markus Kuhn one to one generated from Unicode datafiles caused it to return width 1 (https://bugs.kde.org/show_bug.cgi?id=396435#c21). libunistring, vim/NeoVim, ridiculousfish/widecharwidth seem to know nothing about Hangul Jamo, and return width 1. Some context follows: Korean Hangul is a writing system which uses syllable blocks consisting of alphabetic components. A syllable consists of one or more Leading Consonants, one or more Vowels, and zero or more trailing consonants. Unicode has precomposed syllable blocks at U+AC00..U+D7A3 (11172). There are also component Jamos: Hangul Jamo (U+1100..U+11FF). U+1100..U+115F Choseong (initial, Leading Consonants) have East_Asian_Width=Wide and Hangul_Syllable_Type=Leading_Jamo U+1160..U+11A7 Jungseong (medial, Vowels) have East_Asian_Width=Neutral and Hangul_Syllable_Type=Vowel_Jamo U+11A8..U+11FF Jongseong (final, Trailing consonants) have East_Asian_Width=Neutral and Hangul_Syllable_Type=Trailing_Jamo U+A960..U+A97F Hangul Jamo Extended-A (choseong) have East_Asian_Width=Wide U+D7B0..U+D7FF Hangul Jamo Extended-B (jungseong and jongseong) have East_Asian_Width=Neutral U+3130..U+318F Hangul Compatibility Jamo have no conjoining behavior U+FFA0..U+FFDF half-width forms have no conjoining behavior. U+1100..U+11FF, U+A960..U+A97F, U+D7B0..U+D7FF have conjoining behavior, a sequence of L+V+T* gets rendered as a syllable block. wcwidth() implementations tend to give U+1100..U+115F width 2, and U+1160..U+11FF width 0, so the resulting syllable block has the correct total width. U+D7B0..U+D7FF, should also have width 0. glibc gave width 0 to conjoining jungseong and jongseong at: commit 7a79e321c6f85b204036c33d85f6b2aa794e7c76 Author: Thorsten Glaser <tg@...bsd.de> Date: Fri Jul 14 14:02:50 2017 +0200 Refresh generated charmap data and ChangeLog [BZ #21750] * charmaps/UTF-8: Refresh. diff --git a/localedata/ChangeLog b/localedata/ChangeLog index 04ef5ad071..9e05b4a652 100644 --- a/localedata/ChangeLog +++ b/localedata/ChangeLog @@ -1,3 +1,17 @@ +2017-07-14 Thorsten Glaser <tg@...bsd.de> + + [BZ #21750] + * charmaps/UTF-8: Refresh. + * unicode-gen/utf8_gen.py (U+00AD): Set width to 1. + * unicode-gen/utf8_gen.py (U+1160..U+11FF): Set width to 0. + * unicode-gen/utf8_gen.py (U+3248..U+324F): Set width to 2. + * unicode-gen/utf8_gen.py (U+4DC0..U+4DFF): Likewise. + * unicode-gen/utf8_gen.py: Treat category Me and Mn as combining. + [BZ #19852] + * unicode-gen/utf8_gen.py: Process EastAsianWidth lines before + UnicodeData lines so the latter have precedence; remove hack + to group output by EastAsianWidth ranges. + [ ... snip ...] commit 6e540caa21616d5ec5511fafb22819204525138e Author: Mike FABIAN <mfabian@...hat.com> Date: Tue Jun 16 08:29:40 2020 +0200 Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120] Reviewed-by: default avatarCarlos O'Donell <carlos@...hat.com> diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8 index 14c5d4fa33..8cce47cd97 100644 --- a/localedata/charmaps/UTF-8 +++ b/localedata/charmaps/UTF-8 @@ -48920,6 +48920,8 @@ WIDTH <UABE8> 0 <UABED> 0 <UAC00>...<UD7A3> 2 +<UD7B0>...<UD7C6> 0 +<UD7CB>...<UD7FB> 0 <UF900>...<UFA6D> 2 <UFA70>...<UFAD9> 2 <UFB1E> 0 Regards, -- Luis Javier Merino MorĂ¡n
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.