Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <op.w1dsyje7dyj81a@monster.itedn32a.localdomain>
Date: Tue, 06 Aug 2013 14:14:33 +0800
From: Roy <roytam@...il.com>
To: musl@...ts.openwall.com
Subject: Re: Re: iconv Korean and Traditional Chinese research so far

Tue, 06 Aug 2013 03:12:47 +0800, Rich Felker <dalias@...ifal.cx> wrote:

> On Mon, Aug 05, 2013 at 04:28:32PM +0800, Roy wrote:
>> Since I'm a Traditional Chinese and Japanese legacy encoding user, I
>> think I can say something here.
>> [...]
>> There is another Big5 extension called Big5-UAO, which is being used
>> in world's largest telnet-based BBS called "ptt.cc".
>>
>> It has two tables, one for Big5-UAO to Unicode, another one is
>> Unicode to Big5-UAO.
>> http://moztw.org/docs/big5/table/uao250-b2u.txt
>> http://moztw.org/docs/big5/table/uao250-u2b.txt
>>
>> Which extends DBCS lead byte to 0x81.
>
> OK, I've been trying to do some research on this and I turned up:
>
> http://lists.w3.org/Archives/Public/public-html-ig-zh/2012Apr/0061.html
> http://lists.gnu.org/archive/html/bug-gnu-libiconv/2010-11/msg00007.html
>
> My impression (please correct me if I'm wrong) is that you can't use
> Big5-UAO as the system encoding on modern versions of Windows (just
> ancient ones where you install unmaintained third-party software that
> hacks the system charset tables)

It doesn't "hack" the nls file but replaces with UAO-available CP950 nls  
file.
The executable(setup program) is generated with NSIS(Nullsoft Scriptable  
Install System).
Since the nls file format doesn't change since NT 3.1 in 1993 till now NT  
6.2(i.e. Win 8.1 "Blue"), the UAO-available CP950 nls will continue to  
work in newer versions of windows unless MS throw away nls file format  
with something different.

> and that it's not supported in GNU
> libiconv. If this is the case, and especially if Big5-UAO's main use
> is on a telnet-based BBS where everybody is using special telnet
> clients that have their own Big5-UAO converters,

GNU libiconv even not supports IBM EBCDIC(both SBCS and stateful  
SBCS+DBCS)!
So does it matter if GNU libiconv is not support whatever encodings? (Yes  
glibc iconv(or say, gconv modules) does support both IBM EBCDIC SBCS and  
stateful SBCS+DBCS encodings)

> I'd find it really
> hard to justify trying to support this. But I'm open to hearing
> arguments on why we should, if you believe it's important.

I think it will be nice to have build/link time option for those  
"unpopular" encodings.

>> For static linking, can we have conditional linking like QT does?
>
> My feeling is that it's a tradeoff, and probably has more pros than
> cons. Unlike QT, musl's iconv is extremely small.

I would add "right now" here. When we adds more encoding later, iconv  
module will be bigger than now, and people will need to find a way to  
conditionally compiling the encoding they need (for both dynamically or  
statically)

> Even with all the
> above, the size of iconv.o will be under 130k, maybe closer to 110k.
> If you actually use iconv in your program, this is a small price to
> pay for having it fully functional. On the other hand, if linking it
> is conditional, you have to consider who makes the decision, and when.
> If it's at link time for each application, that's probably too much of
> a musl-specific version.

Since statically linking libc-iconv is new area now (other libc doesn't  
touch this topic much), I think we can create standard for statically  
linking specified encoding table in link time.
(This is also a reason of "why libc should provide an unique identifier  
with preprocessor define")

> If it's at build time for musl, then is it
> your device vendor deciding for you what languages you need? One of
> the biggest headaches of uClibc-based systems is finding that the
> system libc was built with important options you need turned off and
> that you need to hack in a replacement to get something working...
>
> I think the cost of getting stuck with broken binaries where charsets
> were omitted is sufficiently greater than the cost of adding a few
> tens of kb to static binaries using iconv, that we should only
> consider a build time option if embedded users are actively reporting
> size problems.

>
> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.