Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <168e182.5148.1396fef8583.Webtop.0@cox.net>
Date: Tue, 28 Aug 2012 21:12:11 -0400 (EDT)
From: jfoug@....net
To: john-users@...ts.openwall.com
Subject: Re: JtR vs. hashcat on /r/crypto




On Tue, Aug 28, 2012 at 5:41 PM, Solar Designer wrote:

> On Tue, Aug 28, 2012 at 02:21:07PM -0500, jfoug wrote:
>>> From: Solar Designer [mailto:solar@...nwall.com]
>>>
>>> 2. It turns out (was news to me) that hashcat added SunMD5 support
>>> recently (on CPU).  According to atom, it does not use SIMD, yet is
>>> faster than ours with SIMD (JimF's unreleased code in magnum-jumbo).
>>> I've asked atom for specific speed numbers, but we might want to do 
>>> our
>>> own benchmarks as well (Jim?), if we don't mind running the closed-
>>> source hashcat for that. ;-)
>>
>> I have a strong belief the coin flip logic we have (the original sun 
>> logic),
>> is where the speedup can be found. Yes, we did remove a %5 in one of 
>> the
>> loops.  But there still has to be a LOT of optimization left. There 
>> is a lot
>> of temp memory usage, and memory movement.  It 'could' be some other 
>> factor,
>> but I really think not.  This is why I was surprised by only a 3.5x
>> improvement when going to SSE2 code.  I expected a much higher rate, 
>> since
>> we modify that large buffer so little.
>
> Well, it's 4.4x with XOP, but I wouldn't be surprised by a higher
> speedup (over the original Sun code or equivalent) with further
> optimizations on top of SIMD usage.  What surprises me is that atom 
> says
> he achieved greater speed "by not using SIMD".

The SIMD was 'hard', due to having to find, load, process and later 
unwind the 2 different sized input buffers. However, since there was 
only a 16 byte block that was modified, this wind/unwind actually is 
very trivial to do (CPU cost wise).


>> Possibly there is something Atom was able to find, that busted the 
>> coinflip,
>> and found some way to compute it in a deterministic (or nearly
>> deterministic) manner.
>
> Even if so, I don't see how that alone would provide more than a ~4x
> speedup without going SIMD.  Didn't the original Sun code use the
> non-SIMD MD5 code fairly optimally (well, except for wasting a little
> bit of time on the modulo division and such)?

Not in my opinion.  There is a lot of array loading of temp values. 
There HAS to be a much better way to work down to that 1 bit.

> Maybe I need to take a closer look at the code myself, but for now 
> I'll
> just wait to see the performance numbers for hashcat's SunMD5. 
> Perhaps
> someone can try it out and post in here?

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.