john-users - Re: Work (optimization) in progress, and some ideas

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090730192305.GA1519@openwall.com>
Date: Thu, 30 Jul 2009 23:23:05 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Work (optimization) in progress, and some ideas

On Sun, Jul 26, 2009 at 03:35:52PM -0500, JimF wrote:
> >>It could be something like:
> >>$.!string! and ^.!string!      (again ! being any char not used in string)
> >>
> >>this would require $\. or ^\.  (or $[.] ^[.] for the pre-processor) to
> >>append just a period.
> >
> >I thought of this before, and decided against this approach for the
> >following reasons:
> >
> >1. This would break some existing valid rules (those appending/prepending
> >the magic character or the escape character).
> 
> ?? Not sure of this.  The rule change I list is $charvalue or $.strvalue. 
> Thus, the 'only' value you could not currently do, is $.  To do that you
> would have to escape it.

Yes, and also "$\\" because the backslash character would become special
under this rule command.

This means precisely what I wrote - existing valid rule commands, namely
"$." and "$\\", would break.

> So, $\. would prepend a period  $\\ would prepend the escape.

(You meant "append".)

Yes, but you'd have to write these as "$\\." and "$\\\" because of the
backslash character being special to the preprocessor.

> >2. The checks for the magic character and the escape character would have
> >some performance cost.
> 
> Fully agreed.  But is there not a check now for $\\   ?

The backslash is checked for in the preprocessor, so it comes at "no
cost" in the current implementation.  It's out of the per-word loop.

> Speaking of checking the char after, what different logic does these 2 rules
> get?
> 
> $a
> $[a]

They're equivalent.  The latter is a weird way to write the former,
abusing the preprocessor for nothing.

> It appears to me that '[' is looked at.  Or is this 'rule' removed and 
> handled by the preprocessor?

The preprocessor translates "$[a]" into just "$a".  As I wrote above,
this is currently done out of the per-word loop, so there's "no cost"
(as long as your wordlist is of non-negligible size), but it's a weird
way to express a simple rule.  I am puzzled as to why you and Minga seem
to prefer this weird syntax. ;-)

> >Right now, JtR tries the entire wordlist against each rule before it
> >advances to the next rule.  If the next rule has the same initial few
> >commands (not only rejections, but any commands), those will be applied
> >to all entries in the wordlist for a second time.
> >
> >To deal with this, as well as to provide a way to temporarily swap the
> >loops for some other reason (and I can think of at least one other valid
> >reason), I've been thinking of introducing a way to declare "rule
> >blocks" - some sort of BEGIN and END directives within a ruleset.
> >JtR would then swap the loops while working on such a block.
> 
> This sounds exactly like what would do the 'trick'.  Changing from a
> depth first search to a bredth first search, so that each word gets the
> common rules done once, and then gets the 'specialized' rule work
> done if and only if the word is still there, would obtain 99.9..% of the
> speed gain (probably faster, since no interaction grepping data files)
> 
> This
> 'BEGINBLOCK=common_code'
> rule1
> rule2
> rule3
> 'ENDBLOCK'
> method certain far improves what I was doing.  I really see that the
> 'common_code' in this instance works great as 'pre work' prior to
> doing rule1, rule2, rule3 on the word, but could there be instances
> where common 'post' processing would be needed?   I at this time
> can't think of any, because I have all of this 'pre' processing view
> of this new feature, but others might see useful post rules.

I think people could want common post-processing as well, but it would
not provide any performance improvement.

> >When going through the rules for the first time, JtR would need to
> >detect common initial substrings between adjacent rules (or better but
> >trickier - detect common initial sets of commands) and record the
> >character position (or command number) of the first difference into the
> >rule struct (to be introduced, and we'd have to keep the preprocessor
> >output in memory).  Then, when actually working at full speed, the rules
> >engine would use this "first difference position" to determine when to
> >cache the intermediate result, when to recall it, and how many initial
> >characters or rule commands to skip.
> 
> Automatic is one way, but could this also be left up to the user to provide?
> If this sort of behavior is wanted, the user would be pretty 'sure' what 
> he/she wanted.

Yes, this makes sense.  I was trying to make this easy and almost
transparent for the users, but maybe we want to keep the code in JtR
simpler instead.

The automatic approach would allow JtR to optimize things when one
"blindly" switches to "breadth first" (in your words) for a pre-existing
rule block (perhaps even written by someone else).  I was even thinking
of allowing the switch from the command-line, in addition to via the
block directives.

> Thus, instead of having
> !?d-8-c<8/?Lrule1
> !?d-8-c<8/?Lrule2
> !?d-8-c<8/?Lrule3
> !?d-8-c<8/?Lrule4
> ...
> 
> where john have to pre-scan, find and 'remove' all of the !?d-8-c<8/?L 
> values, simply allowing the user to pre-inform john that this situatation
> is happening
> like
> STARTBLOCK=!?d-8-c<8/?L
> rule1
> rule2
> rule3
> ....
> ENDBLOCK

(By the way, the specific rules you used in the above examples are weird.
The rule reject flags, such as "-8", are only valid before the first
rule command.  "!?d" is a rule command, not a rule reject flag.  The
difference between these kinds of rejects is that rule reject flags
reject entire rules, whereas the rejection rule commands only reject
specific input "words".)

Yes, "explicit" syntax like the above is an option.  We could also allow
for recursive blocks.

> and when john is pre-scanning rules, it would 'spot' these blocks, and
> of course not load them, but track where they start, where they end,
> and what prelim work is done.  That way, once that rule number is
> hit, john could start working breadthfirst search over the rules, applying
> the common stuff, then running the result (if any) over each of the rules
> within the group.

I think that you're confused about "not loading", but other than that I
agree that this is a valid option.

Alexander

-- 
To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply
to the automated confirmation request that will be sent to you.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.