john-users - Re: Issue Applying Rules to Tokenized in John the Ripper

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGSLPCapWcC+Zq6+gFX_C2P4gT3UDeSuqu632Y4S7=VFM5ZE-A@mail.gmail.com>
Date: Thu, 27 Mar 2025 14:53:08 +0530
From: Pentester LAB <pentesterlab3@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: Issue Applying Rules to Tokenized in John the Ripper

Thank you for your detailed response and for clarifying the correct
approach to applying rules to a tokenized wordlist in JtR. Your explanation
helped me understand how the tokenizer is intended to be used and why my
previous attempt was incorrect.

I appreciate the step-by-step breakdown and the examples you provided. I'll
go through them carefully and experiment with the tokenizer along with the
suggested approaches.

If I have any clarifications in the future or further doubts, I will reach
out and ask.

Thanks again for your guidance and for taking the time to explain this in
detail!


On Thu, Mar 27, 2025 at 8:38 AM Solar Designer <solar@...nwall.com> wrote:

> A correction inline, and addition below:
>
> On Thu, Mar 27, 2025 at 03:30:48AM +0100, Solar Designer wrote:
> > Trying to repair your weird attempts above using unmodified tokenize.pl:
> >
> > $ cat test.txt
> > abc
> > @
> > 123
> > $ perl tokenize.pl test.txt > john-local.conf
> > $ sed '/[^ -~]/d; s/123/\x1/g; s/abc/\x2/g; s/ab/\x3/g; s/23/\x4/g;
> s/bc/\x5/g; s/12/\x6/g' test.txt > test-tokenized.txt
> > $ ./john --wordlist=test-tokenized.txt --external=Untokenize --stdout
> > Using default input encoding: UTF-8
> > abc
> > @
> > 123
> > 3p 0:00:00:00 100.00% (2025-03-27 03:01) 60.00p/s 123
> > $ ./john --wordlist=test-tokenized.txt --rules=Best64
> --external=Untokenize --stdout | head
> > Using default input encoding: UTF-8
> > Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for
> status
> > Enabling duplicate candidate password suppressor using 256 MiB
> > 124p 0:00:00:00 100.00% (2025-03-27 03:01) 1033p/s 123123123123123123
> > abc
> > @
> > 123
> > abc0
> > @0
> > 1230
> > abc1
> > @1
> > 1231
> > abc2
> > $ wc test.txt test-tokenized.txt
> >  3  3 10 test.txt
> >  3  1  6 test-tokenized.txt
> >
> > Where I took the "sed" command from the generated john-local.conf, but
> > removed the final part where it had "; s/^/:/" as that part was there
> > for producing fake pot files (for incremental mode training) rather than
> > wordlists.
> >
> > As you can see, --external=Untokenize was able to correctly restore the
> > wordlist from its tokenized or compressed form (original test.txt was 10
> > bytes, but tokenized test-tokenized.txt only 6 bytes).  And the rules
> > are applied if you request them.
> >
> > Moreover, you can see that they're applied differently and their effect
> > is different than if you used the same rules on the original wordlist:
> >
> > $ ./john --wordlist=test.txt --rules=Best64 --external=Untokenize
> --stdout | head
> > Using default input encoding: UTF-8
> > Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for
> status
> > Enabling duplicate candidate password suppressor using 256 MiB
> > 152p 0:00:00:00 100.00% (2025-03-27 03:07) 1013p/s 123121
> > abc
> > @
> > 123
> > cba
> > 321
> > ABC
> > Abc
> > abc0
> > @0
> > 1230
>
> Oops, I forgot to remove --external=Untokenize from the above command
> line.  Luckily, it didn't affect anything this time because test.txt has
> no token codes in it.  But the correct command here would be simply:
>
> $ ./john --wordlist=test.txt --rules=Best64 --stdout | head
> Using default input encoding: UTF-8
> Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
> Enabling duplicate candidate password suppressor using 256 MiB
> 152p 0:00:00:00 100.00% (2025-03-27 03:46) 1688p/s 123121
> abc
> @
> 123
> cba
> 321
> ABC
> Abc
> abc0
> @0
> 1230
>
> > The generated password candidates are different and their number is also
> > different (152 original vs. 124 when rules are applied to tokenized
> > wordlist prior to --external=Untokenize).  That's the point of my idea
> > number 13, so thank you for making me try it out.
>
> To more fully test my idea, we need to see whether and how many
> different candidate passwords the rules+Untokenize run adds on top of a
> simple rules run.
>
> In the above tests, the simple run produces 152 unique candidates.
> They're unique due to our dupe suppressor, as otherwise Best64 would
> tend to produce lots of dupes.  The rules+Untokenize run produces 124,
> but the output from this run has 125 lines out of which 123 are unique.
> There are 3 instances of the empty line.  I'm actually puzzled by that
> (we could want to investigate it in case it's a bug).
>
> Anyway, combining those 152 and 123, I get 165 unique.  So, yes, this
> weird trick does add 13 unique candidate passwords.  They are:
>
> TAB
> 123123123
> 123123123123
> 123123123123123
> 123123123123123123
> 123123123123123123123123
> 23
> abcabcabc
> abcabcabcabc
> abcabcabcabcabc
> abcabcabcabcabcabc
> abcabcabcabcabcabcabcabc
> bc
>
> where TAB is the control character (which puzzles me a bit).
>
> Alexander
>
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.