john-dev - Re: Character encoding 'how-to' and patch 0009

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BLU0-SMTP208F90679775B80CB02C612FD330@phx.gbl>
Date: Mon, 25 Jul 2011 22:39:11 +0200
From: Frank Dittrich <frank_dittrich@...mail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Character encoding 'how-to' and patch 0009

Am 25.07.2011 16:26, schrieb JimF:
> If simple '8-bit' fixed size character encoding (wide char encodings
> are not listed in this howto).
>
> 1. Build arrays of to-upper and to-lower values in rules.c. These
> arrays have to be the upper and matching lower case values, listed in
> the same order. If there are upper case only, or lower case only
> letters, then build a separate array for them.

I assume you mean characters which don't have a corresponding upper or
lower case character within the code page in question.
E.g., Ÿ (Unicode code point U+0178) is the upper case character for ÿ
(Unicode code point U+00FF), but only ÿ (latin small letter y with
diaresis) is part of iso-latin1.
For me, it is not clear whether or not ÿ should be converted to Ÿ when
applying rule u.

Another example: ß (U+00DF, latin small letter sharp s, aka German
Eszett, is a lower case character, which doesn't have an upper case version.
Even though recently (unicode version 5.1) ẞ (U+1E9E, latin capital
letter sharp s) has been added, hardly any user knows that this letter
exists, let alone how to enter such a character.
As far as I know, this character is meant either for small caps fonts,
or for writing EVERYTHING IN UPPER CASE...
(With a German keybord layout, you cannot enter this character by
pressing <shift>-<ß>.)

> 5. within unicode.c, add code into utf16toplain() to handle the
> conversion from utf16 back into the 8 bit character set.
>

What about Unicode characters which don't have a representation in the
single-byte code page?
(May be I would find out by just reading the source code...)

Frank

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.