john-users - Re: Issue Applying Rules to Tokenized in John the Ripper

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250331000317.GA2293@openwall.com>
Date: Mon, 31 Mar 2025 02:03:17 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Issue Applying Rules to Tokenized in John the Ripper

On Thu, Mar 27, 2025 at 04:07:42AM +0100, Solar Designer wrote:
> On Thu, Mar 27, 2025 at 03:30:48AM +0100, Solar Designer wrote:
> > The generated password candidates are different and their number is also
> > different (152 original vs. 124 when rules are applied to tokenized
> > wordlist prior to --external=Untokenize).  That's the point of my idea
> > number 13, so thank you for making me try it out.
> 
> To more fully test my idea, we need to see whether and how many
> different candidate passwords the rules+Untokenize run adds on top of a
> simple rules run.
> 
> In the above tests, the simple run produces 152 unique candidates.
> They're unique due to our dupe suppressor, as otherwise Best64 would
> tend to produce lots of dupes.  The rules+Untokenize run produces 124,
> but the output from this run has 125 lines out of which 123 are unique.
> There are 3 instances of the empty line.  I'm actually puzzled by that
> (we could want to investigate it in case it's a bug).
> 
> Anyway, combining those 152 and 123, I get 165 unique.  So, yes, this
> weird trick does add 13 unique candidate passwords.  They are:
> 
> TAB
> 123123123
> 123123123123
> 123123123123123
> 123123123123123123
> 123123123123123123123123
> 23
> abcabcabc
> abcabcabcabc
> abcabcabcabcabc
> abcabcabcabcabcabc
> abcabcabcabcabcabcabcabc
> bc
> 
> where TAB is the control character (which puzzles me a bit).

I investigated the puzzling 3 instances of the empty line and TAB.  No
bug there.  It's just how the best64 rules work, especially hashcat's
"+" command, which increments the ASCII code.  (This ruleset was meant
for hashcat, and we run it in our hashcat compatibility mode.)  When
applied to tokens, which are themselves non-printable characters, this
may produce other non-printable characters, including controls.  In this
tiny test case, we only have token codes 1 to 6:

	mod[1] = 0x333231; // "123" 3
	mod[2] = 0x636261; // "abc" 3
	mod[3] = 0x3332; // "23" 2
	mod[4] = 0x6362; // "bc" 2
	mod[5] = 0x3231; // "12" 2
	mod[6] = 0x6261; // "ab" 2

A few increments of these bring them to TAB (ASCII 9) and LF (ASCII 10).
Since these are higher than 6, they're not further modified by
--external=Untokenize - there's no string to replace them "back" to.

When the LF character is printed, it becomes two LFs at once - one is LF
itself and the other is LF added after this line - so two empty lines.

Some other rules result in a proper empty string, which the suppressor
includes only once, but it's distinct from the LF string.  So we get 3
empty lines in total.  Also, one of them is correctly not counted
towards the number of candidate passwords since it's inside a candidate.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.