|
Message-ID: <4E458793.4030005@bredband.net> Date: Fri, 12 Aug 2011 22:05:39 +0200 From: magnum <rawsmooth@...dband.net> To: john-dev@...ts.openwall.com Subject: Re: Unicode, casing, obtaining data, and some real-world MSSQL (2000) data. On 2011-08-12 20:34, jfoug wrote: > Well, getting that 100% workable, and being able to do things like properly > collate things such as "MASSE" and "Maße" is not the real ‘purpose’ we need > in john. This depends on where/why we uppercase. If we uppercase within a format, like LM or MSSQL, we should obviously uppercase just like the native format would (maße -> MAßE in all cases we've seen). But if I feed a German lowercase wordlist to john, attacking a case-significant format and using rules for permutations I would want maße -> MASSE because that is how a German would likely write it. > What I found here, is several things. First, if the _wsetlocale() was not > called, then the only upcasing/lowcasing was A..Z<-> a..z Then, if > _wsetlocale() was called (with a valid locale), then the exact same casing > was happening, NO MATTER WHAT locale is used. Remember, we are in Unicode, > so the OS simply turns on the above 0x7F casing rules, but they are the same > for the OS. Are you saying that if you set a locale it would go from just a-z to complete Unicode - BUT using the system locale instead of the one you specified? That weird, kinda defeats the whole purpose of wsetlocale(). > Thus, when I do release this, it will likely be an initial release, and need > some work tweaking it. Also, I had some problems with magnums recent UTF-32 > changes. I need to work through some of that with him, as I do not fully > understand all of that code. Do you mean the reinstated "third case" in utf8towcs()? It does not convert to UTF-32 but to UTF-16 with surrogate pairs. I expect Windows UTF-16 hashes to be just like that but I haven't confirmed it with empirical data. I tested it against Perl (pass_gen.pl). At some point we will need conversions to UTF-32 (no surrogate pairs) too but I won't touch that until I see a format that hashes UTF-32. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.