|
Message-ID: <op.w2h4xydmdyj81a@monster.itedn32a.localdomain> Date: Wed, 28 Aug 2013 08:57:24 +0800 From: Roy <roytam@...il.com> To: musl@...ts.openwall.com Subject: Re: Re: Re: Big5 "mostly" complete On Tue, 27 Aug 2013 09:53:49 +0800, Rich Felker <dalias@...ifal.cx> wrote: > On Sun, Aug 18, 2013 at 07:19:57PM +0800, Roy wrote: >> On Sun, 18 Aug 2013 15:32:29 +0800, Rich Felker <dalias@...ifal.cx> >> wrote: >> >> >On Sun, Aug 18, 2013 at 12:20:47PM +0800, Roy wrote: >> >>Both Big5-UAO and Big5-HKSCS are needed for those Taiwan people and >> >>Hong Kong people. >> >>For Big5-UAO, there is some commonly used dingbats(for example "♡" >> >>mark) and numeric representations(for example "①") are in Big5-UAO >> >>but not in CP950. >> >>and Big5-UAO is still being used not only in ptt.cc telnet BBS, but >> >>also in text data files(file lists/cue sheets) because of >> >>not-supporting UTF-8 in applications(for example, Perl File-system >> >>I/O in windows, CD-Rippers). >> >>for Big5-HKSCS, it use used for storing commonly used Cantonese >> >>ideographs (for example, "𨋢" means "lift" in Cantonese) in Hong >> >>Kong. >> > >> >HKSCS is supported as of yesterday's commit. I'm aware that it's >> >needed for representing Cantonese language in Big5, and that it's >> >widely used on the web. >> > >> >What I'm not clear on is the necessity of UAO. Keep in mind that iconv >> >is an API for information interchange: things like interpreting web >> >content, email, old text files, etc. The fact that UAO exists is not >> >alone reason to support it; it has to actually have usefulness in >> >situations where the iconv interface should be used. If you want to >> >see it included, this is what you need to convince us of: >> > >> >- That it's in widespread use in large volumes of existing data (on >> > the web, text files, etc.) or data that is being newly generated >> > (e.g. as a default encoding of popular mail software). >> >> People are told *NOT* to publish file with Big5-UAO to the web(or >> say, people, even the creator of UAO, appeal to people that not to >> publish file with Big5-UAO to the web), but still there are some >> that's in archive format.(Like I said before, for example cue-sheet >> file of CD-ROM image, etc.) >> But for local data processing, UAO does facilitate file managing to >> windows users. > > Based on this, I think: > > (1) It's reasonable to omit UAO for now, and > (2) Support for iconv to load user-defined character mappings would be > a worthwhile feature to work on post-1.0. > That is good. But I have few feature request about this: - user-defined mapping can be overlayed to another coding, just like HKSCS does. - user-defined mapping can be embedded to static-linked binary. And for Unicode to CJK legacy encodings is a must (hope it is available before musl-1.0) > My reasoning is that the goal of iconv in musl, at least for the > built-in character set conversions, is to facilitate information > interchange, particularly reading of data that may be received in > email, as documents published on the web, via IRC or IM protocols, > etc. An encoding whose creators specifically request that it NOT be > used for publishing/interchange is well outside this scope. Yeah it is not encouraged for publishing since it is not a standard and people are not encouraged to install UAO blindly, but people do use it for private interchange(like sending files via ftp/instant messaging) > > I agree with your examples (CD-ROM cue sheets, archived text files, > that telnet BBS, etc.) that there is a need by some users to > process/import data encoded in UAO, but most of these usages do not > seem to require general applications, treating charsets in an > abstract, MIME-style manner, to be able to handle it. For many of the > examples, a command-line conversion utility (BTW, there are ones much > more powerful than iconv out there) would be the logical choice. For > the BBS, my understanding is that most of its users are using special > telnet/terminal apps with the conversion built-in. > >> >- That it's necessary to represent linguistic content in languages >> > used in Taiwan, not just as a substitute for Unicode to represent >> > foreign languages. >> >> It does, some Chinese ideographs are used as part of name, but not >> in CP950 mapping like "喆" and "堃". > > How do these users send email or enter their names in web-based apps? > My guess would be that the email clients switch to UTF-8 when > encountering a character they can't encode in Big5, and that, > nowadays, most web apps are built on CMS that are Unicode-based. Is > this correct? > Yes, most popular web apps are using UTF-8 nowadays. In the past, people enter (方方土) as 堃 and (吉吉) as 喆, and they may install ChinaSea/UAO/etc. charset extensions for 堃 and 喆 as well. >> >- That failure to support it would put musl's iconv in a worse >> > position of compatibility than other iconv implementations or >> > software-specific (e.g. in-browser) character set conversions. >> >> Since people made Big5-UAO patch for libiconv and glibc(gconv) >> unofficially to meet their uses, if musl libc have an optional >> Big5-UAO mapping will be an advantage to Taiwan people. > > *nod* > > For what it's worth, how do those patches handle it? Do they add a new > "Big5-UAO" charset name to iconv, or do they modify the existing Big5 > to treat it as UAO? The original patches by Tiberius Teng modify Big5 with Big5-UAO mappings. I'm trying to reach Tiberius and get the patch if available. And there is another libiconv patch that adds big5-uao encoding instead. http://ku.myftp.org/goods/libiconv-1.11-uao.patch.bz2 > > My feeling for now is to increase the priority of adding custom local > charmap files to iconv after musl 1.0 is released. My main reason is > that "intended for information interchange" vs "intended only for > local use" seems to be the best guideline for whether an encoding is > appropriate to include built-in. > > Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.