|
Message-ID: <00fc01cc5939$89ea66b0$9dbf3410$@net>
Date: Fri, 12 Aug 2011 16:48:14 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: Patch 0007 Codepage enahancements
I have added a new patch (0007) to the wiki page. This patch adds numerous
new code page encodings, into the Unicode.c file, and into rules. It also
adds 2 'types' of Unicode casing data. Once from Unicode.org, and the other
from observations from M$ Windows, and from MSSQL behavior. Also a couple
of strange bugs showed up in mscash1 and NT formats, when loading a
character U+0080.
New version of cmpt_cp.pl, and this script requires the UnicodeData.txt file
to be located in ./src/unused (and the script has to be run from ./src).
This script now detects MANY things other than simple up case / downcase.
It detects things like, control, numbers, white space, etc, etc. Then these
can get loaded into rules.c when that code page is selected.
I still have a little work to do on the test suite. The current mssql-old
will likely have problems with the 'existing' test suite. This is due to
the test suite's data being wrong. I DO have a large set of test files
which were 100% generated BY mssql, so I am taking them as golden, much more
than fake hashes generated by a perl script. The mssql-old-fmt-plug.c
format works 100% with these new test suite files.
Jim.
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.