|
Message-ID: <op.w1fqobr8dyj81a@monster.itedn32a.localdomain> Date: Wed, 07 Aug 2013 15:20:25 +0800 From: Roy <roytam@...il.com> To: musl@...ts.openwall.com Subject: Re: Re: Re: Re: iconv Korean and Traditional Chinese research so far On Wed, 07 Aug 2013 08:54:35 +0800, Roy <roytam@...il.com> wrote: [snip] > > Big5-HKSCS 2004 map for reference: > http://moztw.org/docs/big5/table/hkscs2004.txt > Use sed and awk to create b2u.txt for comparing: > $ sed -e '/^==/d' -e '1,2d' hkscs2004.txt| awk 'BEGIN{print "# big5 > unicode"}{print "0x" $1 " 0x" $4}' > hkscs2004-b2u.txt > In result: > http://roy.dnsd.me/hkscs2004-b2u.txt > > And finally the diff: > http://roy.dnsd.me/uao250-hkscs2004.diff > > The diff is huge so separated table is needed. I forgot that the HKSCS table has original CP950 entries missing. $ cat cp950-b2u.txt hkscs2004-b2u.txt | sed -e '1d'|sort > hkscs2004-big5-b2u.txt And I wrote a small utility in PHP to compare 2 tables by keys(first column): http://roy.dnsd.me/tbldiff.phps $ php tbldiff.php uao250-b2u.txt hkscs2004-big5-b2u.txt > uao250-vs-hkscs2004.txt http://roy.dnsd.me/uao250-vs-hkscs2004.txt $ sed -e '/==/d' uao250-vs-hkscs2004.txt > uao250-hkscs2004-diff.txt http://roy.dnsd.me/uao250-hkscs2004-diff.txt So 5965 mappings are different, including 1379 mappings does not exist in HKSCS2004. But since there is mix-usage of HKSCS2001/2004 in both local files and Internet pages, the condition of HKSCS become worse. BTW, There is another NLS hack that hacks MS-CP932 to support JIS X 0213:2004 http://www.eonet.ne.jp/~kotobukispace/ddt/jisx0213/jisx0213.html
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.