|
Message-ID: <20130805173144.GL221@brightrain.aerifal.cx> Date: Mon, 5 Aug 2013 13:31:45 -0400 From: Rich Felker <dalias@...ifal.cx> To: musl@...ts.openwall.com Subject: Re: Re: iconv Korean and Traditional Chinese research so far On Mon, Aug 05, 2013 at 11:43:45AM -0400, Rich Felker wrote: > > In EUC-KR (MS-CP949), there is Hanja characters (i.e. Kanji > > characters in Japanese) and Japanese Katakana/Hiragana besides of > > Hangul characters. > > Yes, I'm aware of these. However, it looks to me like the only > characters outside the standard 94x94 grid zone are Hangul syllables, > and they appear in codepoint order. If so, even if there's not a good > pattern to where they're located, merely knowing that the ones that > are missing from the 94x94 grid are placed in order in the expanded > space is sufficient to perform algorithmic (albeit inefficient) > conversion. Does this sound correct? I've verified that this is correct and committed an implementation of Korean based on this principle, which I basically copied from my current implementation of GB18030's support for arbitrary Unicode codepoints. It has not been heavily tested but I did test it casually with all the important boundary values and it seems correct. Tests should probably be added to the test suite. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.