|
Message-ID: <20090726171611.GA13686@openwall.com> Date: Sun, 26 Jul 2009 21:16:11 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Work (optimization) in progress, and some ideas Jim, I've skipped your comments on my points 1-3, just not to spend too much time on this discussion. We're mostly thinking the same. I initially wrote the rules.c code when JtR only supported iterated and salted hashes, the fastest of which was DES-based crypt(3). Then, when faster and saltless hashes "started to appear", I was still quite satisfied with the speed of that code, because the default rulesets would complete quickly. For typical runs, the total overhead was maybe on the order of 1 minute, which was insignificant if the total session duration could be hours, days, or weeks (mostly in "incremental" mode). Now, I realize that people have started to (ab)use the rules to a greater extent, coming up with thousands of wordlist rules and applying them to fast saltless hashes. ("Normally", JtR would only use almost this many rules when in "single crack" mode.) Given this change in "use patterns", I agree that it may be time to rework the rules.c code. On Sat, Jul 25, 2009 at 08:06:02PM -0500, JimF wrote: > >4. Introduce string append/prepend/insert commands, like: > > > >A"string" - append > >B"string" - prepend > >CN"string" - insert in position N > > These would be WONDERFUL additions. I have about 12000 rules, > which were gleaned off running cracked words against other > dictionaries, looking for prefix and postfix strings, then sorting based > upon how often a prefix is seen, then keeping the ones that were seen > 3 or more times (in sorted order of course). I ended up with 12k rules. > I built another program to convert those prefix and postfix strings into > 'proper' rules for john (knowing that prefix has to be in reverse order, > etc), and on it's output, it alternated prefix and postfix rules (there > were more postfix rules, btw). That set of rules when run cracks a > LOT of passwords, once the 'normal' john default rules have run, but > it runs quite a bit slower, due to so much string moving stuff in john. > If we had string append that would speed things up a lot. Appending > with existing rules is one time through all of the logic of a rule for > each letter, vs a simple one time run through the rule logic. No matter > how much the prepend and append logic for one letter is optimized, > it will not be faster than append of a word (string.) That's true. Speaking of your specific intended use described above, I think a "next generation" version of JtR should introduce some multi-word or passphrase cracking mode. This is a topic we could discuss separately and at a later time. > So, your rule would be: > > A!string! where ! is any unused character within the string? That type > logic works fine for me. That's correct. > >Unfortunately, the 'P' and 'I' command characters are already taken. > > Do they have to be 1 letter? For optimal performance, yes, as long as we don't introduce a compiler into some internal representation (which is something to consider when we're ready for a major rework). Also, single-letter commands allow for using the preprocessor on them: :[AB]"string" At a later time the preprocessor may be enhanced to also support expressions like {string1,string2}, but we're not there yet. > It could be something like: > $.!string! and ^.!string! (again ! being any char not used in string) > > this would require $\. or ^\. (or $[.] ^[.] for the pre-processor) to > append just a period. I thought of this before, and decided against this approach for the following reasons: 1. This would break some existing valid rules (those appending/prepending the magic character or the escape character). 2. The checks for the magic character and the escape character would have some performance cost. Also, your preprocessor example is not correct. The preprocessor is separate from the rest of the rules engine, and I'd like to keep it that way, so we can't reasonably "teach" the preprocessor to treat a character specially. Thus, the rule command syntax you proposed would be incompatible with having a preprocessor expression for appending/prepending an arbitrary character from a list (which might include the magic character). For example, instead of: $[abc.123] one would have to write: $[abc] $\\. $[123] In fact, since the backslash would also become special in that context, it would not be possible to list it as well. So instead of: $[abc\\123] one would have to write: $[abc] $\\\ $[123] Really not great. (The backslash is already a special character for the preprocessor, which is why it has to be doubled now and would have to be tripled.) > This way, the same 'prepend' 'append' rule semantics are used, keeping > the 'learning' of new rules to a minimum. Yes, but the drawbacks outweigh that. > I am not sure if preprocessing > rules would need messed with, as I don't think pre-generated strings > would really benefit from the pre-processor. There's currently no way to turn the preprocessor off, and I don't think we want to change that. It would be a source of confusion. Some other thoughts on the string commands: 1. "prepend" is redundant when we have "insert", yet we could want to implement it separately for clarity and because we don't have an optimizing compiler of the rules into an internal representation. We already have this sort of redundancy for the single-letter commands. 2. "prepend" and "insert" could be made faster if we knew the length of the string to be prepended/inserted in advance. This may be achieved by changing the input syntax to require the user to specify the length (inconvenient), by introducing a rules compiler into an internal representation, or by introducing some sort of per-rule-command caching (there may be multiple string commands per rule, so the lengths would need to be cached separately for each). > Speaking of rules, I have found the rejection rules to be a little on > the slow side. Part of this, is due to the rejection rule having > to operate on each word, one at a time, and if you have many rules > with the same rejection rule, john does not know that this word > was rejected already, so checks it again. [...] > I am not sure we can optimize john a whole lot for this > type of situation. Does anyone have any other ideas? I think this problem can be generalized as follows: Right now, JtR tries the entire wordlist against each rule before it advances to the next rule. If the next rule has the same initial few commands (not only rejections, but any commands), those will be applied to all entries in the wordlist for a second time. To deal with this, as well as to provide a way to temporarily swap the loops for some other reason (and I can think of at least one other valid reason), I've been thinking of introducing a way to declare "rule blocks" - some sort of BEGIN and END directives within a ruleset. JtR would then swap the loops while working on such a block. When going through the rules for the first time, JtR would need to detect common initial substrings between adjacent rules (or better but trickier - detect common initial sets of commands) and record the character position (or command number) of the first difference into the rule struct (to be introduced, and we'd have to keep the preprocessor output in memory). Then, when actually working at full speed, the rules engine would use this "first difference position" to determine when to cache the intermediate result, when to recall it, and how many initial characters or rule commands to skip. If a rejection occurs at or before the caching position, that fact should be cached instead of the intermediate result (string). Then the following rules with the same or larger caching position should be skipped. The pointer to the next rule to process in such a case could have been cached too. To make this even more optimal, two caching positions could be kept track of: one as described above and the other at the end of the last rejection command within the common initial set of commands. > Sorry for the ramble. Jim Hey, that's what this mailing list is for. ;-) Well, not only that, but I appreciated your postings so far. Thanks, Alexander -- To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply to the automated confirmation request that will be sent to you.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.