john-users - Re: Need help for understanding rule preprocessor

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170601135438.GA2537@openwall.com>
Date: Thu, 1 Jun 2017 15:54:39 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Need help for understanding rule preprocessor

Hi Alex,

On Wed, May 31, 2017 at 11:40:11PM -0500, Alex Liu wrote:
> I am a computer science student at The University of Chicago. I am using
> john (version:  1.7.9-jumbo-7_omp [macosx-x86-64]) and got confused by the
> preprocessor command. I wonder what the following two commands mean:
> 
> -[:c] <* !?A \p1[lc] p
> -[:c] l /a /e /l /o /s sa4 se3 sl1 so0 ss$ (?\p1[za] \p1[:c]
> 
> Specifically, what does [:c] and -[:c] mean? What does \p1[za] and \p1[lc]
> mean here?

Since you correctly refer to these pieces as preprocessor commands, it
appears that you already found they're documented in doc/RULES - but I
guess the documentation is hard to understand.

What the preprocessor does is expand individual original ruleset lines
into a possibly larger number of rules.  In this case, 2 lines are
expanded into 4, as you can test and find out in this way:

[user@...t run]$ cat john-local.conf
[List.Rules:test]
-[:c] <* !?A \p1[lc] p
-[:c] l /a /e /l /o /s sa4 se3 sl1 so0 ss$ (?\p1[za] \p1[:c]
[user@...t run]$ echo -n test | md5sum | cut -d' ' -f1 > pw
[user@...t run]$ echo test | ./john --pipe --rules=test --session=test --verbosity=4 --format=raw-md5 pw
Using default input encoding: UTF-8
Loaded 1 password hash (Raw-MD5 [MD5 128/128 AVX 4x3])
Warning: no OpenMP support for this hash type, consider --fork=32
Press Ctrl-C to abort, or send SIGUSR1 to john process for status
0g 0:00:00:00  0g/s 13.33p/s 13.33c/s 13.33C/s tests..Tests
Session completed
[user@...t run]$ fgrep -w Rule test.log
0:00:00:00 - Rule #1: '-: <* !?A l p' accepted as '<*!?Alp'
0:00:00:00 - Rule #2: '-c <* !?A c p' accepted as '<*!?Acp'
0:00:00:00 - Rule #3: '-: l /a /e /l /o /s sa4 se3 sl1 so0 ss$ (?z :' accepted as 'l/a/e/l/o/ssa4se3sl1so0ss$(?z'
0:00:00:00 - Rule #4: '-c l /a /e /l /o /s sa4 se3 sl1 so0 ss$ (?a c' accepted as 'l/a/e/l/o/ssa4se3sl1so0ss$(?ac'

This is with recent bleeding-jumbo.  The much older 1.7.9* might not
have needed the "--verbosity=4" option for this, but otherwise contains
this logging functionality too.

Here you can see the resulting 4 rules after preprocessor expansion
(left) and also after processing of the rule reject flags and nop
squeezing (right).

Now I'll explain what these specific pieces do:

The dashes at the start of lines indicate rule reject flags, quoting
doc/RULES:

	Rule reject flags.

-:	no-op: don't reject
-c	reject this rule unless current hash type is case-sensitive

We want the "<* !?A l p" rule to be used unconditionally, but the
"<* !?A c p" rule used on case-sensitive hash types only.  That's
because using the latter on a case-insensitive hash type would be
redundant, since the difference between the "l" (lowercase) and "c"
(capitalize) commands would be lost.  For this optimization, we need
the "-c" rule reject flag on the latter rule.

We also don't want to spend two separate lines on these similar rules,
so we compact them into one line using the preprocessor.  To do this, we
introduce the "-:" no-op rule reject flag into the first rule and then
combine the two rule reject flags as "-[:c]".  Then we also need to
combine the "l" and "c" commands, but we want them to be applied along
with the corresponding rule reject flags.  We do this with "\p1[lc]",
where the "\p1" requests processing "in parallel" with list or range
"number one", which in this case is our two rule reject flags that are
the first preprocessor expansion found on this line.

In the second original line, we also use a no-op rule command (on top of
a no-op rule reject flag), for a similar reason.  As you can see, this
no-op command gets squeezed out and does not affect runtime performance.

Yes, this is way too much complexity for saving two lines, but when you
get used to these concepts and use them a lot they end up making many
rulesets many times shorter (and easier to maintain too, as there's less
duplication so changing something common to multiple ultimate rules
requires fewer edits).

The below isn't about the preprocessor anymore, but now let's see how
the rule reject flags actually reject rules:

[user@...t run]$ rm test.log 
[user@...t run]$ echo test | ./john --pipe --rules=test --session=test --verbosity=4 --format=LM pw
Using default input encoding: UTF-8
Using default target encoding: CP850
Loaded 2 password hashes with no different salts (LM [DES 128/128 AVX-16])
Warning: poor OpenMP scalability for this hash type, consider --fork=32
Will run 32 OpenMP threads
Press Ctrl-C to abort, or send SIGUSR1 to john process for status
0g 0:00:00:00  0g/s 6.666p/s 6.666c/s 13.33C/s TESTS
Session completed
[user@...t run]$ fgrep -w Rule test.log
0:00:00:00 - Rule #1: '-: <* !?A l p' accepted as '<*!?Alp'
0:00:00:00 - Rule #2: '-c <* !?A c p' rejected
0:00:00:00 - Rule #3: '-: l /a /e /l /o /s sa4 se3 sl1 so0 ss$ (?z :' accepted as 'l/a/e/l/o/ssa4se3sl1so0ss$(?z'
0:00:00:00 - Rule #4: '-c l /a /e /l /o /s sa4 se3 sl1 so0 ss$ (?a c' rejected

We requested treating our hash(es) as LM, a case-insensitive hash type.
As a result, the "-c" rule reject flags actually rejected the two rules
(out of four).

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.