|
Message-ID: <20220128194733.GA1960@voyager> Date: Fri, 28 Jan 2022 20:47:33 +0100 From: Markus Wichmann <nullplan@....net> To: musl@...ts.openwall.com Subject: Re: A journey of weird file sorting and desktop systems On Fri, Jan 28, 2022 at 01:01:04PM -0500, Rich Felker wrote: > ICU is really, *really* bad. I don't want to be encouraging people to > use it because basic functionality is missing from libc. > But basic functionality *is* missing from libc, and by design. By the standard. For example, toupper and towupper can only return a single code point. That doesn't work with German's ß character, which has the capital form SS. If you were transforming some general German word group into block capitals for a headline or something, that is the transformation you would use. Now, some people have invented a capital version of ß, that is still new enough to make blocks appear in many programs (test your mail program here: ẞ), but that letter is not widely used. Also, many applications expect towupper and towlower to be inverse functions of each other, but here, not all instance of SS ought to be transformed to ß when passing them through towlower, even if the interface did support such a thing. My point is that the development of interfaces that deal with internationalization might be better put into a library with an interface less rigid than libc, where any adjustment moves at the glacial pace of the Austin Group or WG14, and in any case, breaking changes are completely out of the question. That is also why we still have gets() and strchr(). Whether ICU is a suitable library for that purpose I lack the expertise to say. However, all I have heard about it so far is either that one should use it to cure all i18n ills, or that it is an abomination unto the Lord. But even the people in the second camp fail to recommend a superior alternative. So I'm guessing there isn't one. As to the actual function in question: Simply having a possibility to switch strcoll to be the same as strcasecmp instead of strcmp would probably already be the 80% solution for most European languages. Yeah, it won't work with umlauts, but we Germans are used to that. "It is <current year> and we still can't do umlauts" is a common curse levelled at information technology, and for the most part it is apt. I routinely counsel against using umlauts in file names or pass phrases, because you never know what character set it gets saved in or transmitted later, and it just causes avoidable problems. I really doubt this issue will ever be solved within my lifetime. JM2C, Markus
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.