|
Message-ID: <20101116232016.GA23967@openwall.com> Date: Wed, 17 Nov 2010 02:20:16 +0300 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Wordlist Mangling Rule On Sat, Nov 13, 2010 at 10:23:50AM +1300, Al Grant wrote: > I have tried from the FAQ rule page to decrypt how the rules you have > written work. I'm not sure what page you refer to. There's one documenting the rules syntax, but it's not a FAQ: http://www.openwall.com/john/doc/RULES.shtml > Would you mind breaking it down? Ie [c:c] does what etc? Let's start with a simpler line: <B >7 [clu] The square brackets trigger preprocessor expansion. So this line gets expanded into 3 separate rules: <B >7 c <B >7 l <B >7 u Each rule is individually applied to all words from your wordlist. Let's look at the first one of these rules: <B >7 c It contains three rule commands. Unlike separate rules (above), the rule commands in the same rule (on the same line post-expansion) are applied one after another - that is, the second command is applied to the result of the first (not to the original word), the third one is applied to the result of the second, etc. Also, if one of the commands rejects the input word, further commands are not used for that word; the entire rule (one line above) produces no output for such a word. The first command above is "<B". The "<" character is the command code. It is documented in doc/RULES as: <N reject the word unless it is less than N characters long The "B" character corresponds to the "N" placeholder in the documentation - that is, it is the position code. These are also documented in doc/RULES: "Numeric constants may be specified and variables referred to with the following characters: 0...9 for 0...9 A...Z for 10...35 [...]" According to this, "B" specifies the number 11. Thus, the command "<B" will reject its input word (and not let it be processed with further commands on the same line) "unless it is less than 11 characters long". In other words, it will insist that words be no longer than 10 - that's one of the requirements you had mentioned for words that we're not going to append digits to. The next command is ">7". (This one is only reached if "<B" did not reject the word.) Similarly, this one insists that words be no shorter than 8 characters (8 being the smallest number that is "greater than 7"). Finally, the last command in that rule is "c". It is documented as: c capitalize Thus, the entire "<B >7 c" rule will capitalize words that are 8 to 10 characters long, but it will reject others. The next two rules: <B >7 l <B >7 u are similar, except they will "convert to lowercase" and "convert to uppercase", respectively. That's all for the simple line discussed so far: <B >7 [clu] Now let's see what the next line does: <8 >6 [clu] $[0-9] This one gets expanded into as many as 30 rules: <8 >6 c $0 <8 >6 c $1 [...] <8 >6 c $9 <8 >6 l $0 [...] <8 >6 l $9 <8 >6 u $0 [...] <8 >6 u $8 <8 >6 u $9 (I've omitted many of them above.) So that's 30 rules, each consisting of 4 commands. The first 3 of the commands were already discussed above (although the length limits are different now). The fourth one appends a digit: $X append character X to the word (where a specific digit is substituted for the "X" placeholder mentioned in the documentation). The next ruleset lines may be: <7 >5 [clu] Az"[0-9][0-9]" <6 >4 [clu] Az"[0-9][0-9][0-9]" <5 >3 [clu] Az"[0-9][0-9][0-9][0-9]" The last one of these is expanded into as many as 30,000 rules: <5 >3 c Az"0000" <5 >3 c Az"0001" [...] <5 >3 u Az"9998" <5 >3 u Az"9999" Each of the above rules consists of 4 commands, the first 3 of which we've already discussed. The fourth is: AN"STR" insert string STR into the word at position N The documentation also says: "To append a string, specify "z" for the position." which is also documented in its proper section: z "infinite" position or length (beyond end of word) So we're inserting the "string STR" beyond the end of the word - or in other words, we're indeed appending the string. In each of the 30,000 rules (produced for us by the preprocessor on the fly), only one specific string to append is specified (e.g., only "0000" initially). Now let's consider these more complicated ruleset lines: -\r[c:c] <B >7 \p[clu] -\r[c:c] <8 >6 \p[clu] $[0-9] -\r[c:c] <7 >5 \p[clu] Az"[0-9][0-9]" -\r[c:c] <6 >4 \p[clu] Az"[0-9][0-9][0-9]" -\r[c:c] <5 >3 \p[clu] Az"[0-9][0-9][0-9][0-9]" These differ from those we've discussed so far by the addition of "-\r[c:c]" to the beginning and "\p" into the middle. Let's see what these achieve. First, "[c:c]", with its non-escaped use of square brackets, is indeed a preprocessor expression, much like "[clu]" and "[0-9]", which we've discussed above. "\r" and "\p" are "magic escape sequences" to the preprocessor. These are documented closer to the end of doc/RULES: "Finally, the preprocessor supports some magic escape sequences. These start with a backslash and use characters that you would not normally need to escape. [...] "\p" before a range to have that range processed "in parallel" with preceding ranges [...] "\r" to allow the range to produce repeated characters." Thus, this line: -\r[c:c] <B >7 \p[clu] is expanded into three rules: -c <B >7 c -: <B >7 l -c <B >7 u We needed "\r" because we have two instances of the "c" character in "[c:c]" and we wanted to preserve both (see below for the explanation). We needed "\p" to have the two character lists - "[c:c]" and "[clu]" - processed "in parallel". In other words, we wanted only the three lines above to be produced, not 9 lines for all combinations, which is what we would get from the preprocessor by default (and which we relied upon when appending digits, above). Now, what does "-c" at the start of a rule do? This is a "rule reject flag", documented as: -c reject this rule unless current hash type is case-sensitive Note that unlike "<B" and other "rule commands", which reject individual input words, the "rule reject flags" reject entire rules. Thus, if the current hash type is case-insensitive - which pretty much means LM hashes in practice - the entire rule (which is "<B >7 c") will be rejected. Indeed, with a case-insensitive hash there's no point in capitalizing words when we're going to try them as-is as well (by the next rule). If we did not reject the rule, then effectively duplicate candidate passwords would be generated and hashed, thereby wasting time. The next rule is: -: <B >7 l This one uses a rule reject flag too, but a dummy one: -: no-op: don't reject The only reason why it does, and why this flag is even supported, is to allow for our use of the preprocessor. These flags have almost no performance cost anyway - they're applied per-rule, not per-word. As you can see in the log, the rules being applied per-word have their rule reject flags, if any, already removed from them. Finally, we have: -c <B >7 u which is similar to the first one of these three rules - it is applied to case-sensitive hashes only. As to the rest of the original ruleset lines: -\r[c:c] <8 >6 \p[clu] $[0-9] -\r[c:c] <7 >5 \p[clu] Az"[0-9][0-9]" -\r[c:c] <6 >4 \p[clu] Az"[0-9][0-9][0-9]" -\r[c:c] <5 >3 \p[clu] Az"[0-9][0-9][0-9][0-9]" these are expanded into larger numbers of rules. The last one of these is expanded into 30,000 rules like: -c <5 >3 c Az"0000" -c <5 >3 c Az"0001" [...] -: <5 >3 l Az"0000" [...] -c <5 >3 u Az"9999" ...and we've already discussed the meaning and the rationale of the individual rule reject flags and rule commands in use by these rules. Whew, looks like that's all. This is simple stuff for me, but I see how it can be complicated for others given that explaining it takes a while. Does this help? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.