|
Message-ID: <4E41134F.1040103@bredband.net> Date: Tue, 09 Aug 2011 13:00:31 +0200 From: magnum <rawsmooth@...dband.net> To: john-dev@...ts.openwall.com Subject: "valid character" class On 2011-08-05 01:41, Solar Designer wrote: > On Fri, Aug 05, 2011 at 01:32:41AM +0200, magnum wrote: >> What is the ?z (any character) class used for? Is it used anywhere, by >> anyone? It's current meaning is indeed *any* character, valid or not. > [...] >> Maybe it was meant for PP stuff, much like the ':'. > > Exactly. It's a no-op produced by some preprocessor expressions for > some of the expanded rules. I have a to-do item to have JtR optimize > out such no-ops just like it does for ':' lately. OK, I think we'll go for ?y for 'valid' then. Question to *all*: There are some characters that are truly invalid for a codepage, like 0x98 in cp1251. There are also characters that are not really invalid per the Unicode spec, but control characters. For example, in most (all?) ISO-8859-xx codepages, the characters 0x80..0x9F. Should we treat the latter as invalid? There are pros and cons. My personal vote is that we should treat them as invalid, i.e. the rule !?Y would drop any candidate that contains 0x80..0x9F if we're using --enc=iso-8859-1 but only 0x98 if using -enc=cp1251. One effect of doing so is ability to reject/accept any UTF-8 encoded words (from a mixed wordlist like RockYou.txt) using such rules because *all* non-ascii characters in UTF-8 contains octets in that range. Of course, this could also be achived with another new, UTF-8 specific, character class. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.