john-dev - Re: Judy array

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <061D3811-75D5-45E8-96EE-8502D43F8681@gmail.com>
Date: Mon, 14 Sep 2015 18:20:12 -0700
From: Fred Wang <waffle.contest@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Judy array


On Sep 14, 2015, at 2:32 PM, Solar Designer <solar@...nwall.com> wrote:

>> 
>> My focus has always been large number of hashes, and thus my statements. By large, I mean > 1,000,000 unsolved hashes (I typically run with about 80,000,000 unsolved).
> 
> Sure, but how large is the optimal Bloom filter for those hash counts?
> Isn't it much larger than L3 cache anyway?  If so, I don't see why it'd
> work faster than an even larger bitmap.


I use a fixed-size bloom filter currently, set at 134 megabytes - easy enough to change, but this gave me good results with my 80M data set.  Of course, it does even better on smaller data sets.

Let's have a look:

On a E5-2680 v2 @ 2.80GHz, using the hacked best64 we talked about, processing the 29m hash file:

mdxfind take 21 seconds (13 seconds to read hashes, 7.84 to hash) doing 831,312,959 hashes to get 1,710,650 finds.
mdxfind used about 1.5 gigabytes of ram.

			-t [cores]
		1	4	8	16	32
no rules	0:21	0:15	0:14	0:14	0:14
best64		1:55	0:41	0:28	0:22	0:21	


1.8.0.6-jumbo-1-bleeding:
john takes about 18 seconds to read hashes, and used about 3 gigabytes of ram.  The same rules on John only got 1,481,414 finds.  I don't know why.


			    --fork=
 		1	4	8	16	32
no rules: 	0:25	0:21	0:20	0:32	1:04
Best64:		3:46	2:05	1:55	2:37	4:02




> 
>> By "as soon as I can", I mean as I finish the first 32 bits of the MD5, for example.
> 
> Oh, do you proceed to compute more than 32 bits of MD5 even before you
> know you'd need more than 32 bits?  That's weird if so, since you have
> other work to do there (on other candidate passwords) that you're sure
> you'd make use of.

Yes, I always complete the hashes, for every kind of hash.  Part of the reason for that is that I decouple the search function from the hash function.  This is due to the fact that I have to be able to process rotated, truncated, substring, and other manipulated hashes, and I can't do that if I don't have the completed hash.

That doesn't stop me from giving the Bloom filter a helping hand, via prefetch.  It costs nothing, and gives great benefit.


> 
>> If you are considering only a (very) small number of hashes, then yes, bitmaps could fit in cache and thus be faster.  Because I have sufficient work to do before I check the bloom filter results, however, there appears to be little penalty.
> 
> You shouldn't want to initiate more than one Bloom filter lookup for a
> hash computed until you know that one lookup would happen to be
> insufficient... or are you sure you'd always have spare bandwidth to do
> so (and only latency matters, which you deal with in the way you
> described)?
> 


That's not true.  For example while processing rules (or md5(md5($pass).$salt) for example), I build 4 plaintexts, then run one SIMD MD5, giving me 4 results.  As I am completing the MD5, I issue prefetch for all 4 prior to completing the rest of the summation for the MD5.  I won't be able to check it until later, but the prefetches are queued, and this seriously helps the lookup time.

And the proof is in the numbers.  No pathological performance degradation as the number of cores increases, even though I am making 32x4 prefetch requests to the main memory.



Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.