john-users - Re: Work (optimization) in progress, and some ideas

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <E3F15F91E46C4495B234EBFC88DF75CB@ath64dual>
Date: Sun, 26 Jul 2009 15:35:52 -0500
From: "JimF" <jfoug@....net>
To: <john-users@...ts.openwall.com>
Subject: Re: Work (optimization) in progress, and some ideas

> That's true.  Speaking of your specific intended use described above,
> I think a "next generation" version of JtR should introduce some
> multi-word or passphrase cracking mode.  This is a topic we could
> discuss separately and at a later time.

This is the 'exact' cracking mode I was looking for.  Most were multi-word
but also common prefix/suffix work.  John already has some 'common'
append things, like english plurals, ing, ly, and other things.  Those are
great rules, and much of what was done here. But finding many arbitrary
prefix and suffixes for a language, and then using those as the rules, and
running against a larger dictionary (or even running BACK against just
the language dictionary), cracked a lot.

Simply put, all I did was try to figure out a way to do something like this,
with the existing john.   The 'solution' I have come to is certainly far 
from
optimal, it really beats on the rule processor (and preprocessor in the way
I have implemented it), but it certainly DID work, and it shows that this
POC method does crack a lot of words.   Making john better support this
type of searching the 'core' level, would do nothing more than improve
john.

Note, I will put a 'howto' together of what I have done to get this to work.
It was a few specialized peices of code, and some 'standard' nix filters
(grep, sort, cut, etc).  Even if john gets enhanced to do handle pre/append
insert and multi-word packing in more of a 'native' manner than I did,
this type of word building in implementation may help in getting more
and better ideas and help lead to a better enhancement to john.

>> Do they have to be 1 letter?
>
> For optimal performance, yes, as long as we don't introduce a compiler
> into some internal representation (which is something to consider when
> we're ready for a major rework).
>
> Also, single-letter commands allow for using the preprocessor on them:

Ok, noted.

>> It could be something like:
>> $.!string! and ^.!string!      (again ! being any char not used in 
>> string)
>>
>> this would require $\. or ^\.  (or $[.] ^[.] for the pre-processor) to
>> append just a period.
>
> I thought of this before, and decided against this approach for the
> following reasons:
>
> 1. This would break some existing valid rules (those appending/prepending
> the magic character or the escape character).

?? Not sure of this.  The rule change I list is $charvalue or $.strvalue. 
Thus,
the 'only' value you could not currently do, is $.  To do that you would 
have to
escape it.  So, $\. would prepend a period  $\\ would prepend the escape.
Am I missing something here?  I think not, but sometimes I miss things that
I think I understand, but truely do not know the inner behavior.  It is 
possible
that the code handling the append would need patched to handle $\. properly
but I think if it did this, then using the period char after the $ would 
signal
the rules processor to expect a string  (format being $.!string! with ! 
being
both the same char, and being char unused within string)

> 2. The checks for the magic character and the escape character would have
> some performance cost.

Fully agreed.  But is there not a check now for $\\   ?  If not, then at 
this time
you could not prepend the \ char anyway.

Speaking of checking the char after, what different logic does these 2 rules
get?

$a
$[a]

It appears to me that '[' is looked at.  Or is this 'rule' removed and 
handled by
the preprocessor?  This question is due to my lack of 'overall' coding
functionality.  I understand some of the 'micro' flow of processing, but 
still
lack some of the 'overall' processing and code flow at the macro
functionality level.

> Also, your preprocessor example is not correct.  The preprocessor is
> ... Big block of explanation

I will have to read this carefully, and get the debugger out and do some
stepping. Thank you VERY much for this information. It will help me
to get a better grasp of wtf is going on within this code, and allow me
to make better knowledgeable choices in asking questions, or in
coding with the rules / ppr's .

>> Speaking of rules, I have found the rejection rules to be a little on
>
> I think this problem can be generalized as follows:
>
> Right now, JtR tries the entire wordlist against each rule before it
> advances to the next rule.  If the next rule has the same initial few
> commands (not only rejections, but any commands), those will be applied
> to all entries in the wordlist for a second time.
>
> To deal with this, as well as to provide a way to temporarily swap the
> loops for some other reason (and I can think of at least one other valid
> reason), I've been thinking of introducing a way to declare "rule
> blocks" - some sort of BEGIN and END directives within a ruleset.
> JtR would then swap the loops while working on such a block.

This sounds exactly like what would do the 'trick'.  Changing from a
depth first search to a bredth first search, so that each word gets the
common rules done once, and then gets the 'specialized' rule work
done if and only if the word is still there, would obtain 99.9..% of the
speed gain (probably faster, since no interaction grepping data files)

This
'BEGINBLOCK=common_code'
rule1
rule2
rule3
'ENDBLOCK'
method certain far improves what I was doing.  I really see that the
'common_code' in this instance works great as 'pre work' prior to
doing rule1, rule2, rule3 on the word, but could there be instances
where common 'post' processing would be needed?   I at this time
can't think of any, because I have all of this 'pre' processing view
of this new feature, but others might see useful post rules.

> When going through the rules for the first time, JtR would need to
> detect common initial substrings between adjacent rules (or better but
> trickier - detect common initial sets of commands) and record the
> character position (or command number) of the first difference into the
> rule struct (to be introduced, and we'd have to keep the preprocessor
> output in memory).  Then, when actually working at full speed, the rules
> engine would use this "first difference position" to determine when to
> cache the intermediate result, when to recall it, and how many initial
> characters or rule commands to skip.

Automatic is one way, but could this also be left up to the user to provide?
If this sort of behavior is wanted, the user would be pretty 'sure' what 
he/she
wanted.  Thus, instead of having
!?d-8-c<8/?Lrule1
!?d-8-c<8/?Lrule2
!?d-8-c<8/?Lrule3
!?d-8-c<8/?Lrule4
...

where john have to pre-scan, find and 'remove' all of the !?d-8-c<8/?L 
values,
simply allowing the user to pre-inform john that this situatation is 
happening
like
STARTBLOCK=!?d-8-c<8/?L
rule1
rule2
rule3
....
ENDBLOCK

and when john is pre-scanning rules, it would 'spot' these blocks, and
of course not load them, but track where they start, where they end,
and what prelim work is done.  That way, once that rule number is
hit, john could start working breadthfirst search over the rules, applying
the common stuff, then running the result (if any) over each of the rules
within the group.


> If a rejection occurs at or before the caching position, that fact
> should be cached instead of the intermediate result (string).  Then the
> following rules with the same or larger caching position should be
> skipped.  The pointer to the next rule to process in such a case could
> have been cached too.  To make this even more optimal, two caching
> positions could be kept track of: one as described above and the other
> at the end of the last rejection command within the common initial set
> of commands.

I will have to re-read that a few times before it 'sinks in' to my brain,
which can be slow to grasp things at times.

Jim. 


-- 
To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply
to the automated confirmation request that will be sent to you.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.