![]() |
|
Message-ID: <20250327023048.GA9191@openwall.com> Date: Thu, 27 Mar 2025 03:30:48 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Issue Applying Rules to Tokenized in John the Ripper Hi, On Thu, Mar 27, 2025 at 05:08:16AM +0530, Pentester LAB wrote: > I am reaching out to seek assistance regarding an issue I encountered while > attempting to apply rules to a tokenized using John the Ripper (JtR). I assume you mean to a tokenized wordlist. The tokenize.pl script was intended for use with probabilistic models, where the input wordlist (or cracked passwords list) would be for training only, and then the model would generate candidates. I did in fact have an afterthought to also try misusing it with wordlist rules, which I mentioned as (currently) idea number 13 at: https://github.com/openwall/john/issues/5597 "The tokenizer can also be useful along with wordlist mode, both to produce different candidate passwords (by applying wordlist rules prior to token expansion) and simply as a compression algorithm. We could want to experiment with this and document useful usage patterns and have this in mind if/when integrating the functionality into john proper." However, what you're doing looks very different, so what is your goal? I'll proceed to reply step-by-step, but chances are you actually wanted something much simpler, which I describe at the end of this message. > Steps Taken: > > 1. > > I created a test input file named test.txt with the following content: > > abc > @ > 123 That's way too little content for intended use of the tokenizer. You'd normally train it on the same large wordlist that you'd use for training the probabilistic model. However, since we're talking experiments with unintended uses, let's proceed. > 2. > > I used JtR's default tokenizer to process the file: > > perl tokenize.pl test.txt > test_token.txt Looks good so far, as preparation for whatever experiment comes next. > 3. > > The content of test_token.txt is as follows: > > # sed '/[^ -~]/d; s/123/\x1/g; s/abc/\x2/g; s/12/\x3/g; s/bc/\x4/g; > s/23/\x5/g; s/ab/\x6/g; s/a/\x7/g; s/1/\x8/g; s/b/\x9/g; s/2/\xb/g; > s/@/\xc/g; s/c/\xe/g; s/3/\xf/g; s/^/:/' > > [List.External:Untokenize] > int mod[0x100]; > > void init() { > for (int i = 0; i < 0x100; ++i) mod[i] = i; > mod[1] = 0x333231; // "123" > mod[2] = 0x636261; // "abc" > mod[3] = 0x3231; // "12" > mod[4] = 0x6362; // "bc" > mod[5] = 0x3332; // "23" > mod[6] = 0x6261; // "ab" > mod[7] = 0x61; // "a" > mod[8] = 0x31; // "1" > mod[9] = 0x62; // "b" > mod[11] = 0x32; // "2" > mod[12] = 0x40; // "@" > mod[14] = 0x63; // "c" > mod[15] = 0x33; // "3" > } > > void filter() { > int i = 0, j = 0, k = 0, save[0x80]; > while (save[i] = word[i]) i++; > while (int m = mod[save[j++]]) { > word[k++] = m; > while (m >>= 8) word[k++] = m; > } > word[k] = 0; > } There's no way tokenize.pl as ever released by our project would produce exactly the above output. I guess you modified it in many ways, which made it produce subtly broken output. I see at least two errors in there: it's tokenizing even single characters (which is at best unneeded), and it tries to use a "for" loop (which our external mode compiler does not support). However, none of this matters when you don't even use this file correctly next: > 4. > > I attempted to crack the hash using the following command: > > john --format=raw-md5 --wordlist=test_token.txt > --rules=KoreLogic,Best64 md5.hash This makes no sense. You use the programs output by the tokenizer as a wordlist, but they're not useful as a wordlist. > Issue Observed: > > - > > JtR correctly loaded the tokenized wordlist, You had no "tokenized wordlist", so it couldn't possibly be "correctly loaded". What you had is a text file with two programs (to perform tokenization and its reverse), which you instead misused as a wordlist. > but it appears that the > selected rules (KoreLogic, Best64) were not applied during the cracking > attempt. They probably were, but it doesn't help much when the wordlist doesn't contain anything resembling passwords (has program code instead). > - > > The session completed without any successful cracks, and no rule-based > transformations seemed to have been executed on the tokenized input. > > Request for Assistance: > > I would appreciate guidance on: > > - > > Ensuring that rules are correctly applied to tokenized. This is irrelevant. > - > > Identifying if there are any misconfigurations or additional parameters > needed. This whole project is about trying out a misconfiguration because the tokenizer was not intended for such misuse, but we may try that anyway. However, worst of all the final command you ran is certainly not what you intended. I recommend that you first learn and practice with intended use of the tokenizer along with incremental mode, as given in comments at the start of tokenize.pl. After you're familiar with that, you can proceed to try weird things if you want to. Trying to repair your weird attempts above using unmodified tokenize.pl: $ cat test.txt abc @ 123 $ perl tokenize.pl test.txt > john-local.conf $ sed '/[^ -~]/d; s/123/\x1/g; s/abc/\x2/g; s/ab/\x3/g; s/23/\x4/g; s/bc/\x5/g; s/12/\x6/g' test.txt > test-tokenized.txt $ ./john --wordlist=test-tokenized.txt --external=Untokenize --stdout Using default input encoding: UTF-8 abc @ 123 3p 0:00:00:00 100.00% (2025-03-27 03:01) 60.00p/s 123 $ ./john --wordlist=test-tokenized.txt --rules=Best64 --external=Untokenize --stdout | head Using default input encoding: UTF-8 Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status Enabling duplicate candidate password suppressor using 256 MiB 124p 0:00:00:00 100.00% (2025-03-27 03:01) 1033p/s 123123123123123123 abc @ 123 abc0 @0 1230 abc1 @1 1231 abc2 $ wc test.txt test-tokenized.txt 3 3 10 test.txt 3 1 6 test-tokenized.txt Where I took the "sed" command from the generated john-local.conf, but removed the final part where it had "; s/^/:/" as that part was there for producing fake pot files (for incremental mode training) rather than wordlists. As you can see, --external=Untokenize was able to correctly restore the wordlist from its tokenized or compressed form (original test.txt was 10 bytes, but tokenized test-tokenized.txt only 6 bytes). And the rules are applied if you request them. Moreover, you can see that they're applied differently and their effect is different than if you used the same rules on the original wordlist: $ ./john --wordlist=test.txt --rules=Best64 --external=Untokenize --stdout | head Using default input encoding: UTF-8 Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status Enabling duplicate candidate password suppressor using 256 MiB 152p 0:00:00:00 100.00% (2025-03-27 03:07) 1013p/s 123121 abc @ 123 cba 321 ABC Abc abc0 @0 1230 The generated password candidates are different and their number is also different (152 original vs. 124 when rules are applied to tokenized wordlist prior to --external=Untokenize). That's the point of my idea number 13, so thank you for making me try it out. With all that said, maybe you actually wanted something completely different. Maybe you didn't need the tokenizer at all. Maybe you wanted to explicitly list your tokens and then have them mixed up, and then rules applied? You'd do that with PRINCE mode: $ ./john --prince=test.txt --rules=Best64 --stdout | head Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status Enabling duplicate candidate password suppressor using 256 MiB @@@ @@@0 @@@1 @@@2 @@@3 @@@4 @@@5 @@@6 @@@7 @@@8 $ ./john --prince=test.txt --rules=Best64 --stdout | tail Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status Enabling duplicate candidate password suppressor using 256 MiB 134147p 0:00:00:00 100.00% (2025-03-27 03:15) 838418p/s 4123123123123@ 1223123123123@ 12323123123123@ 12313123123123@ 131231231231 3123123@...312 1231231231231@ 22312312312312 23@...1231231231 923123123123123@ 4123123123123@ Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.