|
Message-ID: <20171122032524.GO1627@brightrain.aerifal.cx> Date: Tue, 21 Nov 2017 22:25:24 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: cp437 issue with bad mapping at least for one char On Wed, Nov 22, 2017 at 03:50:48AM +0100, Jacob Thrane Lund wrote: > > Hi musl devs, > > I experienced a test failing when building the latest version of gammu for Alpine Linux. > > After reporting the issue to the gammu developer the reached conclusion was the issue is with musl - > https://github.com/gammu/gammu/issues/303#issuecomment-345258460 > > I have checked the log for > https://git.musl-libc.org/cgit/musl/commit/src/locale/codepages.h > and Rich Felker pushed a commit 8 days ago. As of yet I have not had > the chance to verify if this also resolves this issue. Dealing with > charsets at this level is for me totally new territory.. > > I was hoping you could confirm/deny if Rich’s commit indeed also resolves my issue? It does. Here is how CP437 decodes, before: Çüéâäàåç êëèïîìÄÅ ÉæÆôöòûù ÿÖÜ¢£¥₧ƒ ¡¢£¤¥¦§ ¨©ª«¬®¯ ░▒▓│┤╡╢╖ ╕╣║╗╝╜╛┐ └┴┬├─┼╞╟ ╚╔╩╦╠═╬╧ ╨╤╥╙╘╒╓╫ ╪┘┌█▄▌▐▀ αáΓπΣσæτ ΦΘΩδìφεï ðñ≥≤⌠⌡÷≈ °∙·√ü²■ and after: Çüéâäàåç êëèïîìÄÅ ÉæÆôöòûù ÿÖÜ¢£¥₧ƒ áíóúñѪº ¿⌐¬½¼¡«» ░▒▓│┤╡╢╖ ╕╣║╗╝╜╛┐ └┴┬├─┼╞╟ ╚╔╩╦╠═╬╧ ╨╤╥╙╘╒╓╫ ╪┘┌█▄▌▐▀ αßΓπΣσµτ ΦΘΩδ∞φε∩ ≡±≥≤⌠⌡÷≈ °∙·√ⁿ²■ The problem (silently fixed) was that the table generation code for legacychars.h ignored entries in the Unicode charmap files that used lowercase a-f in the hex, _and_ omitted characters that appeared in the same slot as their Unicode codepoint (in all the ISO-8859 encodings containing í, it appears in "its own" slot), since these previously got a special encoding. If not for the latter, this character would have been included in the legacychars.h map already due to being in Latin-1, where the charmap file used uppercase. Somehow when the character was missing in legacychars.h, the mapping tables ended up containing nonsense. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.