john-users - Re: Known part of password, attempting incremental attack

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200722181325.GA17046@openwall.com>
Date: Wed, 22 Jul 2020 20:13:26 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Known part of password, attempting incremental attack

Hello Alexander,

On Tue, Jul 21, 2020 at 01:30:13PM -0400, Alexander Hunt wrote:
> Hello. I am very new to JtR so please bear with me. I am helping my cousin
> who has locked the novel he has been working on for the past 5 years with a
> password that he has since forgotten. He knows the first 5 characters and
> the last 2 char. for sure. He believes there is one word (possibly two)
> between the first 5 and the last 2. He believes it is a dictionary word so
> I started with a Wordlist attack with a dictionary list I pulled off the
> internet, and the parameters he set. That didnt work

Are you confident you ran the attack correctly?  You could want to tell
us what file format, filesystem, etc. the novel is locked in, and how
you processed that for use with JtR, and how you ran JtR on the result,
and what speeds you got.  Ideally, include some copy-paste from your
terminal.  Then you'll have some review of the approach you used.  As
you say, you're very new to JtR, so it is conceivable you simply ran it
incorrectly.

It may also be a good idea to lock something else in the same way, but
with a known password, and make sure you're able to get that one
recovered - for practice and for software compatibility testing.

> so I would like to set
> up a incremental attack to determine the 5-10 characters in between the
> characters he knows. Is this possible?

To answer your question directly, you can do something like:

john -inc=lower -mask='known?w12' hash.txt

You can also limit the lengths:

john --incremental=lower --mask='known?w12' --min-length=12 --max-length=17 hash.txt

I recommend not setting a minimum length, though.

The ?w expands to whatever the previous cracking mode generates, in this
case incremental mode.

I appreciate Albert's help in this thread, but I got some comments:

On Tue, Jul 21, 2020 at 11:21:54PM +0200, Albert Veli wrote:
> And apply rules to the big wordlist and remove duplicates. Removing
> duplicates can be done with the unique command, from john. Creating all
> combinations of two words can be done in many ways. For instance using
> combinator from https://github.com/hashcat/hashcat-utils. So it would then
> be something like this:
> 
> ./combinator.bin words.txt words.txt | unique double.txt

FWIW, I just use trivial Perl scripts for this - see attached.
double.pl is for using the same wordlist twice (like in the above
example), mix.pl is for using two possibly different wordlists.

> To get a good wordlist, try
> https://github.com/first20hours/google-10000-english if it is common
> english words. 10000 is too much to double, try to extract the maybe 3000
> first words and hope both your words are among those. The words are in
> order with the most common first.

I wish this were the case.  Last year, I actually considered using this
wordlist as a basis for a replacement wordlist in passwdqc's
"passphrase" generator.  Unfortunately, the wordlist turned out to be
unsuitable for that purpose, not even as one of many inputs for manual
processing.  It may still be OK for password cracking when 10000 isn't
too many and it's OK to have some non-words mixed in there, but it is no
good when you need to extract fewer common words from there.

Here are some better wordlists for when you need fewer common words:

http://www.ef.edu/english-resources/english-vocabulary/top-100-words/
http://www.ef.edu/english-resources/english-vocabulary/top-1000-words/
http://www.ef.edu/english-resources/english-vocabulary/top-3000-words/

These are not frequency-sorted, but they're readily provided in 3 sizes.

To see how the first20hours/google-10000-english list compares, try e.g.:

$ fgrep -xnf top100eng google-10000-english.txt | head
1:the
2:of
3:and
4:to
5:a
6:in
7:for
9:on
10:that
11:by

$ fgrep -xnf top100eng google-10000-english.txt | tail
270:even
321:him
325:think
413:man
446:look
496:say
504:come
555:give
723:tell
823:thing

So it starts reasonably well, but becomes unreasonable (not matching
English word frequencies) within the first 100 words, and then this only
gets worse.  The word "thing" is on the top 100 list above, but is only
number 823 on that google-10000-english list.  In my own processing of
1962 books from Project Gutenberg Australia (thus, biased to older
books), it is number 165.  I find it hard to believe it'd be as low as
823 in any reasonable corpus.  So whatever corpus was used to build that
list is unreasonable.

Even weirder is "him", somehow number 321 on that list.  On my Project
Gutenberg Australia list, it's number 24.

$ fgrep -xnf top1000eng google-10000-english.txt | tail
6263:laugh
6301:weapon
6588:participant
6821:admit
6843:relate
6848:suffer
6924:scientist
7080:argue
7124:reveal
8150:shake

The word "laugh" is 6263 on google-10000-english, but is 683 on my list.
The word "shake" is 8150 on google-10000-english, but is 2068 on my list
(OK, that is a smaller discrepancy).

Hmm, I should probably release that Project Gutenberg Australia list,
not only use it in my work on passwdqc like I do now.

Alexander

View attachment "double.pl" of type "text/plain" (119 bytes)

View attachment "mix.pl" of type "text/plain" (328 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.