john-dev - Re: raw-sha1-ng reduced binary size (was: asan report)

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <mpro.m6hkxp07swycw00sk.taviso@cmpxchg8b.com>
Date: Sun, 1 Jul 2012 16:44:13 +0200
From: Tavis Ormandy <taviso@...xchg8b.com>
To: john-dev@...ts.openwall.com
Subject: Re: raw-sha1-ng reduced binary size (was: asan report)

<jfoug@....net> wrote:

> 
> ---- magnum <john.magnum@...hmail.com> wrote:
> > On 2012-07-01 13:20, Tavis Ormandy wrote:
> > > I understand, I'm just not sure it's worth the performance penalty
> > > (because I can't treat it like a dqword in cmp_all).
> 
> I have not looked at the code, but would you not simply load the 4 byte
> DWORD, into a reg:
> 
> ABCDxxxxxxxxxxxx
> 
> then replicate this to the entire register
> 
> ABCDABCDABCDABCD
> 
> Then simply do comparison using that to the first register load of each
> group of 4 ?  A register load hear being the first DWORD of each hash, in
> packed format.
> 
> I have not looked at the code, so I am not sure if your SSE buffers setup
> differently than the interleaved DWORDS, but would I think it is done that
> way.

Yeah, obviously, the problem is that's 4 instructions, instead of:

MOVDQU y, x
PSHUFD y, foo

Or, with the redundant format, just one instruction:

MOVDQU y, x

> > > I can think of a faster format if I store it redundantly, like:
> > > 
> > > SHA1  =00112233 44556677 aabbccdd eeff3344 eeaa1122 BINARY=EEAA1122
> > > EEAA1122 EEAA1122 EEAA1122
> > > 
> > > Then I only have to shuffle it once, instead of once per cmp_all.
> > > That's a saving of 4 bytes per hash, and I can still use it like a
> > > dqword, is that ok?
> > 
> > Sure, I did not realize you would end up with a slower cmp_all. There
> > should be some way around that.
> 
> the cmp_all is simply a 'better' hash check.  It 'can' be an exact check,
> (if you are testing all 20 bytes, it is an exact check), but it does not
> have to be.  There are many formats which have used the cmp_all to do a
> full compare, when really it should be written to as quickly as possible
> return that there is no way at all, that any of the passwords were
> cracked.  Same thing for cmp_one. It should as quickly as possible state
> that 'this is not the one'.  Any candidates that do squeeze by cmp_one can
> be fully tested in cmp_all.
> 

The problem isn't that it's slow, it's pretty damn fast. The problem is I'm
obsessive about cycles ;-) I actually have a branchless SSE4.1 comparison,
but because i buffer so many comparisons, a branch with prefetch hints is
seems to work better.

Tavis.

-- 
-------------------------------------
taviso@...xchg8b.com | pgp encrypted mail preferred
-------------------------------------------------------

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.