john-users - Re: problem running out of memory

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240815170241.GA14897@openwall.com>
Date: Thu, 15 Aug 2024 19:02:42 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: problem running out of memory - john 20240608.1

On Thu, Aug 15, 2024 at 03:32:29PM +0200, Solar Designer wrote:
> One way to approach this problem is to not split that file.  Try loading
> them all at once, however don't use --fork at first.  Maybe this will
> fit in memory for you.  As most hashes will get cracked after a while,
> you'll be able to slowly increase the --fork count for further attacks
> (but those should be different attacks).
> 
> I've just tested on Linux, and 100M of unique NTLM hashes take 11 GB RAM
> with default settings, taking about 1 minute to load.  So HIBPv8 847M
> will probably fit in 128 GB RAM.

I've just tested loading the whole HIBPv8 on Linux, and it works:

time ./john --format=nt --verbosity=1 --wordlist --rules --dupe=0 --no-loader-dupe-check pp8
No dupe-checking performed when loading hashes.
Using default input encoding: UTF-8
Loaded 847223402 password hashes with no different salts (NT [MD4 128/128 AVX 4x3])

One important option I keep forgetting to use at first is
--no-loader-dupe-check.  With this option, the above took 8 minutes.
Since it's known that HIBPv8 has no duplicate hashes, no need to waste
time checking for duplicates at loading.  It looks like our duplicate
hash checks scale really well to 100M, but stop scaling nearly so well
at hundreds of million.

Anyway, the above uses 46 GB RAM.  It cracked 2M hashes in first 6
seconds, 5M in 14 seconds, 50M in 3.5 minutes, 100M in 13 minutes, 130M
in 25 minutes, 140M in 32 minutes, 146M by attack completion in 38
minutes.  So even on one CPU core (in this case, in an old Xeon
E5-2670), you can eliminate lots of hashes in under an hour:

146633482g 0:00:38:09 DONE (2024-08-15 17:46) 64055g/s 2171Kp/s 2171Kc/s 1587TC/s Robyn2638..Sambarock38
Session completed.

real    46m11.607s
user    44m26.731s
sys     1m16.396s

Looks like a moderate --fork count could be used right away, but it's
hard to tell exactly which.  On a 128 GB RAM machine, certainly at least
2 would work right away, but probably more.  The loaded hashes are
initially shared between the forked processes, but as more hashes get
cracked the in-memory "databases" become more diverse between processes,
which then uses more memory (copy-on-write).

Another option you could use, but I forgot to use above, is --no-log.
I got a 7 GB john.pot and a 3 GB john.log.  The latter could be avoided
if not needed, which would also speed things up a bit.

Another relevant option is --save-memory, but I don't recommend using it
in this case since it'd likely slow things down a lot per-process, while
allowing probably only for a moderately higher process count.

Now, to proceed further with the remaining hashes, you could simply be
loading the whole file again, and the checks against john.pot would
remove the already cracked hashes from further cracking.  However, they
may still waste memory.  So you could once in a while use --show=left to
obtain the remaining hash list:

./john --format=nt --show=left pp8 > pp8-left

(This will prefix them with "?:$NT$", for username placeholder and hash
type identifier, but it should not matter.)

However, --show=left after the above appeared to take ages, and it
wouldn't accept --no-loader-dupe-check on the command line.  What helped
is setting:

NoLoaderDupeCheck = Y

in john.conf (just don't forget to revert this edit when you work on
smaller hash lists that may have duplicates).  With this, it completed
in 46 minutes, using 44 GB of RAM.  I must admit that's rather long.

146633482 password hashes cracked, 700589920 left

real    46m1.249s
user    37m59.769s
sys     8m1.868s

As expected, pp8-left has 847223402-146633482 = 700589920 lines.

Since this file already has the cracked hashes removed, you don't need
to use the existing john.pot when loading it for further attacks.  You
can use the --pot option to specify an alternative pot file name, which
will speed up the loading.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.