Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250327030742.GB10625@openwall.com>
Date: Thu, 27 Mar 2025 04:07:42 +0100
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Issue Applying Rules to Tokenized in John the Ripper

A correction inline, and addition below:

On Thu, Mar 27, 2025 at 03:30:48AM +0100, Solar Designer wrote:
> Trying to repair your weird attempts above using unmodified tokenize.pl:
> 
> $ cat test.txt 
> abc
> @
> 123
> $ perl tokenize.pl test.txt > john-local.conf 
> $ sed '/[^ -~]/d; s/123/\x1/g; s/abc/\x2/g; s/ab/\x3/g; s/23/\x4/g; s/bc/\x5/g; s/12/\x6/g' test.txt > test-tokenized.txt 
> $ ./john --wordlist=test-tokenized.txt --external=Untokenize --stdout
> Using default input encoding: UTF-8
> abc
> @
> 123
> 3p 0:00:00:00 100.00% (2025-03-27 03:01) 60.00p/s 123
> $ ./john --wordlist=test-tokenized.txt --rules=Best64 --external=Untokenize --stdout | head
> Using default input encoding: UTF-8
> Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
> Enabling duplicate candidate password suppressor using 256 MiB
> 124p 0:00:00:00 100.00% (2025-03-27 03:01) 1033p/s 123123123123123123
> abc
> @
> 123
> abc0
> @0
> 1230
> abc1
> @1
> 1231
> abc2
> $ wc test.txt test-tokenized.txt 
>  3  3 10 test.txt
>  3  1  6 test-tokenized.txt
> 
> Where I took the "sed" command from the generated john-local.conf, but
> removed the final part where it had "; s/^/:/" as that part was there
> for producing fake pot files (for incremental mode training) rather than
> wordlists.
> 
> As you can see, --external=Untokenize was able to correctly restore the
> wordlist from its tokenized or compressed form (original test.txt was 10
> bytes, but tokenized test-tokenized.txt only 6 bytes).  And the rules
> are applied if you request them.
> 
> Moreover, you can see that they're applied differently and their effect
> is different than if you used the same rules on the original wordlist:
> 
> $ ./john --wordlist=test.txt --rules=Best64 --external=Untokenize --stdout | head
> Using default input encoding: UTF-8
> Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
> Enabling duplicate candidate password suppressor using 256 MiB
> 152p 0:00:00:00 100.00% (2025-03-27 03:07) 1013p/s 123121
> abc
> @
> 123
> cba
> 321
> ABC
> Abc
> abc0
> @0
> 1230

Oops, I forgot to remove --external=Untokenize from the above command
line.  Luckily, it didn't affect anything this time because test.txt has
no token codes in it.  But the correct command here would be simply:

$ ./john --wordlist=test.txt --rules=Best64 --stdout | head
Using default input encoding: UTF-8
Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
Enabling duplicate candidate password suppressor using 256 MiB
152p 0:00:00:00 100.00% (2025-03-27 03:46) 1688p/s 123121
abc
@
123
cba
321
ABC
Abc
abc0
@0
1230

> The generated password candidates are different and their number is also
> different (152 original vs. 124 when rules are applied to tokenized
> wordlist prior to --external=Untokenize).  That's the point of my idea
> number 13, so thank you for making me try it out.

To more fully test my idea, we need to see whether and how many
different candidate passwords the rules+Untokenize run adds on top of a
simple rules run.

In the above tests, the simple run produces 152 unique candidates.
They're unique due to our dupe suppressor, as otherwise Best64 would
tend to produce lots of dupes.  The rules+Untokenize run produces 124,
but the output from this run has 125 lines out of which 123 are unique.
There are 3 instances of the empty line.  I'm actually puzzled by that
(we could want to investigate it in case it's a bug).

Anyway, combining those 152 and 123, I get 165 unique.  So, yes, this
weird trick does add 13 unique candidate passwords.  They are:

TAB
123123123
123123123123
123123123123123
123123123123123123
123123123123123123123123
23
abcabcabc
abcabcabcabc
abcabcabcabcabc
abcabcabcabcabcabc
abcabcabcabcabcabcabcabc
bc

where TAB is the control character (which puzzles me a bit).

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.