|
Message-ID: <mpro.m6hkxp07swycw00sk.taviso@cmpxchg8b.com> Date: Sun, 1 Jul 2012 16:44:13 +0200 From: Tavis Ormandy <taviso@...xchg8b.com> To: john-dev@...ts.openwall.com Subject: Re: raw-sha1-ng reduced binary size (was: asan report) <jfoug@....net> wrote: > > ---- magnum <john.magnum@...hmail.com> wrote: > > On 2012-07-01 13:20, Tavis Ormandy wrote: > > > I understand, I'm just not sure it's worth the performance penalty > > > (because I can't treat it like a dqword in cmp_all). > > I have not looked at the code, but would you not simply load the 4 byte > DWORD, into a reg: > > ABCDxxxxxxxxxxxx > > then replicate this to the entire register > > ABCDABCDABCDABCD > > Then simply do comparison using that to the first register load of each > group of 4 ? A register load hear being the first DWORD of each hash, in > packed format. > > I have not looked at the code, so I am not sure if your SSE buffers setup > differently than the interleaved DWORDS, but would I think it is done that > way. Yeah, obviously, the problem is that's 4 instructions, instead of: MOVDQU y, x PSHUFD y, foo Or, with the redundant format, just one instruction: MOVDQU y, x > > > I can think of a faster format if I store it redundantly, like: > > > > > > SHA1 =00112233 44556677 aabbccdd eeff3344 eeaa1122 BINARY=EEAA1122 > > > EEAA1122 EEAA1122 EEAA1122 > > > > > > Then I only have to shuffle it once, instead of once per cmp_all. > > > That's a saving of 4 bytes per hash, and I can still use it like a > > > dqword, is that ok? > > > > Sure, I did not realize you would end up with a slower cmp_all. There > > should be some way around that. > > the cmp_all is simply a 'better' hash check. It 'can' be an exact check, > (if you are testing all 20 bytes, it is an exact check), but it does not > have to be. There are many formats which have used the cmp_all to do a > full compare, when really it should be written to as quickly as possible > return that there is no way at all, that any of the passwords were > cracked. Same thing for cmp_one. It should as quickly as possible state > that 'this is not the one'. Any candidates that do squeeze by cmp_one can > be fully tested in cmp_all. > The problem isn't that it's slow, it's pretty damn fast. The problem is I'm obsessive about cycles ;-) I actually have a branchless SSE4.1 comparison, but because i buffer so many comparisons, a branch with prefetch hints is seems to work better. Tavis. -- ------------------------------------- taviso@...xchg8b.com | pgp encrypted mail preferred -------------------------------------------------------
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.