Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160709214614.GA31350@openwall.com>
Date: Sun, 10 Jul 2016 00:46:14 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Loading a large password hash file

On Thu, Jul 07, 2016 at 12:38:05AM -0400, Matt Weir wrote:
> More of a general question, but what should the default behavior of JtR be
> when you give it an unreasonably large password hash file to crack?

It doesn't know what's reasonable and what's not on a given system and
for a given use case, so it just keeps trying.  Do you feel the default
should be different?

> For example, let's say you give it 270 million Sha1 hashes?

This isn't necessarily unreasonable.  It should load those if memory
permits.  I guess this is related to:

http://reusablesec.blogspot.com/2016/07/cracking-myspace-list-first-impressions.html

In that blog post, you write that after "sort -u" you had an 8 GB file,
which means about 200 million unique SHA-1 hashes.  So I just generated
a fake password hash file using:

perl -e 'use Digest::SHA1 qw(sha1_hex); for ($i = 0; $i < 200000000; $i++) { print sha1_hex($i), "\n"; }'

which is 8200000000 bytes.  On a machine with enough RAM, JtR loaded it
in 6 minutes, and the running "john" process uses 13 GB.

I guess the loading time could be reduced by commenting out "#define
REVERSE_STEPS" in rawSHA1_fmt_plug.c and rebuilding, but I haven't tried
that.  Maybe we should optimize a few things in that format to speedup
the loading.

> Currently if I
> leave it running for a day or two it just hangs trying to process the file.

That's unreasonable.

> This was with bleeding-jumbo.
> 
> Aka I realize the hash file was way too big. Heck the file was large enough
> I couldn't fit the whole thing in RAM on the machine I was using.

Clearly, you need more RAM, or you could probably load half that file at
a time.

There's also the --save-memory option, which may actually speed things
up when you don't have enough RAM.  But that's sub-optimal, and high
memory saving levels may hurt cracking speed a lot.  They also hurt
loading time when there would have been enough RAM to load the hashes
without memory saving.  I've just tried --save-memory=2 on the 200M
SHA-1's file, and it looks like it'll load in about 1 hour (instead of
6 minutes), consuming something like 11 GB.  So probably not worth it in
this case.

> I'm more curious about how JtR should respond to that situation.

I think the current behavior is fine.  There are many OS-specific ways
in which the memory available to a process could be limited, and indeed
the RAM vs. swap distinction is also system-specific.  It'd add quite
some complexity to try and fetch and analyze that info, and to try and
guess (possibly wrongly) what the user's preference would be.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.