Follow @Openwall on Twitter for new release announcements and other news
[<prev] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250402023234.GA28722@openwall.com>
Date: Wed, 2 Apr 2025 04:32:34 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Issue Applying Rules to Tokenized in John the Ripper

On Wed, Apr 02, 2025 at 12:26:06AM +0530, Pentester LAB wrote:
> I followed the article Running JtR's Tokenizer Attack
> <https://reusablesec.blogspot.com/2024/10/running-jtrs-tokenizer-attack.html>
> and tried to generate a modified wordlist using sed and ./tokenize.pl.
> 
> My original wordlist (TRAINING_PASSWORDS.txt):
> 
> tamil
> @
> king
> 2005

This may be fine, but please note that this is still very different
usage from what was intended and from what's in that blog post above.
Normally, you'd train the tokenizer and then incremental mode on a large
number of passwords, not on a few individual tokens.  What you do here
is also a fine and fun thing to try, so feel free to continue to post
about it in here, but please be aware that e.g. --prince mode or maybe
--external=Combinator --rules-stack=Phrase may be more appropriate for
your needs.

> I first ran the following command:
> 
> ./tokenize.pl TRAINING_PASSWORDS.txt
> # sed '/[^ -~]/d; s/tami/\x1/g; s/king/\x2/g; s/2005/\x3/g;
> s/amil/\x4/g; s/kin/\x5/g; s/ing/\x6/g; s/200/\x7/g; s/ami/\x8/g;
> s/005/\x9/g; s/mil/\xb/g; s/tam/\xc/g; s/in/\xe/g; s/05/\xf/g;
> s/ki/\x10/g; s/am/\x11/g; s/ng/\x12/g; s/mi/\x13/g; s/ta/\x14/g;
> s/il/\x15/g; s/20/\x16/g; s/00/\x17/g; s/^/:/'
> 
> After getting the output, I used this command:
> 
> cat TRAINING_PASSWORDS.txt | sed '/[^ -~]/d; s/tami/\x1/g;
> s/king/\x2/g; s/2005/\x3/g; s/amil/\x4/g; s/kin/\x5/g; s/ing/\x6/g;
> s/200/\x7/g; s/ami/\x8/g; s/005/\x9/g; s/mil/\xb/g; s/tam/\xc/g;
> s/in/\xe/g; s/05/\xf/g; s/ki/\x10/g; s/am/\x11/g; s/ng/\x12/g;
> s/mi/\x13/g; s/ta/\x14/g; s/il/\x15/g; s/20/\x16/g; s/00/\x17/g;
> s/^/:/' > new_training.txt
> 
> However, when I checked new_training.txt, the output was incorrect:
> 
> :@
> 
> Why is my sed command producing an incorrect output, and how can I fix it?

My best guess is you don't actually see most of what's in
new_training.txt because the tokens codes do not correspond to printable
characters.  I don't know in what way you displayed that file's
contents.  You may want to try the commands "less" and "od -tx1".  If
you simply "cat" the file on a terminal, you may see partial information
and you may also end up reconfiguring the terminal emulator program via
those control codes.

That file is a fake pot file, so you may more appropriately call it e.g.
fake.pot, and then use it with:

john --pot=fake.pot --make-charset=custom.chr

so that you would then run something like:

john --incremental=custom --external=Untokenize --format=raw-md5 hashes.txt

given you had redirected the output of tokenize.pl into john-local.conf.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.