john-users - Re: Issue Applying Rules to Tokenized in John the Ripper

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGSLPCYHn53NPajh9dk+c=Ag+TTDdDmVsMkN0TJqCJric_+tJQ@mail.gmail.com>
Date: Wed, 2 Apr 2025 00:26:06 +0530
From: Pentester LAB <pentesterlab3@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: Issue Applying Rules to Tokenized in John the Ripper

I followed the article Running JtR's Tokenizer Attack
<https://reusablesec.blogspot.com/2024/10/running-jtrs-tokenizer-attack.html>
and tried to generate a modified wordlist using sed and ./tokenize.pl.

My original wordlist (TRAINING_PASSWORDS.txt):

tamil
@
king
2005

I first ran the following command:

./tokenize.pl TRAINING_PASSWORDS.txt
# sed '/[^ -~]/d; s/tami/\x1/g; s/king/\x2/g; s/2005/\x3/g;
s/amil/\x4/g; s/kin/\x5/g; s/ing/\x6/g; s/200/\x7/g; s/ami/\x8/g;
s/005/\x9/g; s/mil/\xb/g; s/tam/\xc/g; s/in/\xe/g; s/05/\xf/g;
s/ki/\x10/g; s/am/\x11/g; s/ng/\x12/g; s/mi/\x13/g; s/ta/\x14/g;
s/il/\x15/g; s/20/\x16/g; s/00/\x17/g; s/^/:/'

After getting the output, I used this command:

cat TRAINING_PASSWORDS.txt | sed '/[^ -~]/d; s/tami/\x1/g;
s/king/\x2/g; s/2005/\x3/g; s/amil/\x4/g; s/kin/\x5/g; s/ing/\x6/g;
s/200/\x7/g; s/ami/\x8/g; s/005/\x9/g; s/mil/\xb/g; s/tam/\xc/g;
s/in/\xe/g; s/05/\xf/g; s/ki/\x10/g; s/am/\x11/g; s/ng/\x12/g;
s/mi/\x13/g; s/ta/\x14/g; s/il/\x15/g; s/20/\x16/g; s/00/\x17/g;
s/^/:/' > new_training.txt


However, when I checked new_training.txt, the output was incorrect:

:@

Why is my sed command producing an incorrect output, and how can I fix it?


On Mon, Mar 31, 2025 at 5:34 AM Solar Designer <solar@...nwall.com> wrote:

> On Thu, Mar 27, 2025 at 04:07:42AM +0100, Solar Designer wrote:
> > On Thu, Mar 27, 2025 at 03:30:48AM +0100, Solar Designer wrote:
> > > The generated password candidates are different and their number is
> also
> > > different (152 original vs. 124 when rules are applied to tokenized
> > > wordlist prior to --external=Untokenize).  That's the point of my idea
> > > number 13, so thank you for making me try it out.
> >
> > To more fully test my idea, we need to see whether and how many
> > different candidate passwords the rules+Untokenize run adds on top of a
> > simple rules run.
> >
> > In the above tests, the simple run produces 152 unique candidates.
> > They're unique due to our dupe suppressor, as otherwise Best64 would
> > tend to produce lots of dupes.  The rules+Untokenize run produces 124,
> > but the output from this run has 125 lines out of which 123 are unique.
> > There are 3 instances of the empty line.  I'm actually puzzled by that
> > (we could want to investigate it in case it's a bug).
> >
> > Anyway, combining those 152 and 123, I get 165 unique.  So, yes, this
> > weird trick does add 13 unique candidate passwords.  They are:
> >
> > TAB
> > 123123123
> > 123123123123
> > 123123123123123
> > 123123123123123123
> > 123123123123123123123123
> > 23
> > abcabcabc
> > abcabcabcabc
> > abcabcabcabcabc
> > abcabcabcabcabcabc
> > abcabcabcabcabcabcabcabc
> > bc
> >
> > where TAB is the control character (which puzzles me a bit).
>
> I investigated the puzzling 3 instances of the empty line and TAB.  No
> bug there.  It's just how the best64 rules work, especially hashcat's
> "+" command, which increments the ASCII code.  (This ruleset was meant
> for hashcat, and we run it in our hashcat compatibility mode.)  When
> applied to tokens, which are themselves non-printable characters, this
> may produce other non-printable characters, including controls.  In this
> tiny test case, we only have token codes 1 to 6:
>
>         mod[1] = 0x333231; // "123" 3
>         mod[2] = 0x636261; // "abc" 3
>         mod[3] = 0x3332; // "23" 2
>         mod[4] = 0x6362; // "bc" 2
>         mod[5] = 0x3231; // "12" 2
>         mod[6] = 0x6261; // "ab" 2
>
> A few increments of these bring them to TAB (ASCII 9) and LF (ASCII 10).
> Since these are higher than 6, they're not further modified by
> --external=Untokenize - there's no string to replace them "back" to.
>
> When the LF character is printed, it becomes two LFs at once - one is LF
> itself and the other is LF added after this line - so two empty lines.
>
> Some other rules result in a proper empty string, which the suppressor
> includes only once, but it's distinct from the LF string.  So we get 3
> empty lines in total.  Also, one of them is correctly not counted
> towards the number of candidate passwords since it's inside a candidate.
>
> Alexander
>
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.