|
Message-ID: <4FDBAFCE.9010902@banquise.net> Date: Fri, 15 Jun 2012 23:57:34 +0200 From: Simon Marechal <simon@...quise.net> To: john-dev@...ts.openwall.com Subject: Re: Re: [patch] optional new raw sha1 implemetation On 06/15/2012 11:36 PM, Tavis Ormandy wrote: > Oops, good point. I'm not sure how to tell if it's available or not (I > think it was accidentally ommitted in some gcc releases), but gcc seems > to tolerate me writing my own, so I did that. > > I'll look into how to do it properly. I just pushed a "fix" that checks if we are using ICC. It should also fix the x86-64.S problem. >> > The current SSE code cracks 19.8M c/s. Taviso's is faster at 21.3M c/s, >> > and doesn't use the register scheduling trick that is in >> > sse-intrinsics.c. This _might_ mean it could be faster. > Nice, that's great news! Solar also mentioned I should read this, I'll > do that and see if there are any ideas to steal :-) The idea is to work on N*4 32 bit values at the same time instead of just 4, and let the compiler schedule the register allocation so that it hides memory latency and penalties resulting from using a register value just after it is assigned. The first time I saw it was in BarsWF. GCC doesn't seem to be good at it however, and that is the reason there is a x86-64i target with a precompiled .S file from ICC. Other "easy" gains could be achieved by allocating larger buffers, increasing MAX_KEY_PER_CRYPT and running the hashing function several times per crypt_all call.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.