![]() |
|
Message-ID: <20250327030742.GB10625@openwall.com> Date: Thu, 27 Mar 2025 04:07:42 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Issue Applying Rules to Tokenized in John the Ripper A correction inline, and addition below: On Thu, Mar 27, 2025 at 03:30:48AM +0100, Solar Designer wrote: > Trying to repair your weird attempts above using unmodified tokenize.pl: > > $ cat test.txt > abc > @ > 123 > $ perl tokenize.pl test.txt > john-local.conf > $ sed '/[^ -~]/d; s/123/\x1/g; s/abc/\x2/g; s/ab/\x3/g; s/23/\x4/g; s/bc/\x5/g; s/12/\x6/g' test.txt > test-tokenized.txt > $ ./john --wordlist=test-tokenized.txt --external=Untokenize --stdout > Using default input encoding: UTF-8 > abc > @ > 123 > 3p 0:00:00:00 100.00% (2025-03-27 03:01) 60.00p/s 123 > $ ./john --wordlist=test-tokenized.txt --rules=Best64 --external=Untokenize --stdout | head > Using default input encoding: UTF-8 > Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status > Enabling duplicate candidate password suppressor using 256 MiB > 124p 0:00:00:00 100.00% (2025-03-27 03:01) 1033p/s 123123123123123123 > abc > @ > 123 > abc0 > @0 > 1230 > abc1 > @1 > 1231 > abc2 > $ wc test.txt test-tokenized.txt > 3 3 10 test.txt > 3 1 6 test-tokenized.txt > > Where I took the "sed" command from the generated john-local.conf, but > removed the final part where it had "; s/^/:/" as that part was there > for producing fake pot files (for incremental mode training) rather than > wordlists. > > As you can see, --external=Untokenize was able to correctly restore the > wordlist from its tokenized or compressed form (original test.txt was 10 > bytes, but tokenized test-tokenized.txt only 6 bytes). And the rules > are applied if you request them. > > Moreover, you can see that they're applied differently and their effect > is different than if you used the same rules on the original wordlist: > > $ ./john --wordlist=test.txt --rules=Best64 --external=Untokenize --stdout | head > Using default input encoding: UTF-8 > Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status > Enabling duplicate candidate password suppressor using 256 MiB > 152p 0:00:00:00 100.00% (2025-03-27 03:07) 1013p/s 123121 > abc > @ > 123 > cba > 321 > ABC > Abc > abc0 > @0 > 1230 Oops, I forgot to remove --external=Untokenize from the above command line. Luckily, it didn't affect anything this time because test.txt has no token codes in it. But the correct command here would be simply: $ ./john --wordlist=test.txt --rules=Best64 --stdout | head Using default input encoding: UTF-8 Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status Enabling duplicate candidate password suppressor using 256 MiB 152p 0:00:00:00 100.00% (2025-03-27 03:46) 1688p/s 123121 abc @ 123 cba 321 ABC Abc abc0 @0 1230 > The generated password candidates are different and their number is also > different (152 original vs. 124 when rules are applied to tokenized > wordlist prior to --external=Untokenize). That's the point of my idea > number 13, so thank you for making me try it out. To more fully test my idea, we need to see whether and how many different candidate passwords the rules+Untokenize run adds on top of a simple rules run. In the above tests, the simple run produces 152 unique candidates. They're unique due to our dupe suppressor, as otherwise Best64 would tend to produce lots of dupes. The rules+Untokenize run produces 124, but the output from this run has 125 lines out of which 123 are unique. There are 3 instances of the empty line. I'm actually puzzled by that (we could want to investigate it in case it's a bug). Anyway, combining those 152 and 123, I get 165 unique. So, yes, this weird trick does add 13 unique candidate passwords. They are: TAB 123123123 123123123123 123123123123123 123123123123123123 123123123123123123123123 23 abcabcabc abcabcabcabc abcabcabcabcabc abcabcabcabcabcabc abcabcabcabcabcabcabcabc bc where TAB is the control character (which puzzles me a bit). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.