|
Message-ID: <000301cbe40c$a6e24be0$f4a6e3a0$@net> Date: Wed, 16 Mar 2011 14:02:09 -0500 From: "jfoug" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: RE: Speedup of x86 .S build of raw-sha1 format >2011-03-15 8:54 PM, magnum wrote >>On 2011-03-15 23:28, jfoug wrote: >> Here is about a 10% speedup. A very trivial change. >> >> - memset(saved_key, 0, sizeof(saved_key)); >> + //memset(saved_key, 0, sizeof(saved_key)); >> + memset(saved_key, 0, 64*MMX_COEF); > >Good find. I suppose it could be done exactly the same in >mysqlSHA1_fmt.c too? It even has a comment about that memset. But it >runs x1 on x86-64 On mysqlSHA1_fmt.c I found 2 speedup's (SSE). First is above. 2nd is a double byte swap. Here are the timings on my system: john -test=5 -for=mysql-sha1 Benchmarking: MySQL 4.1 double-SHA-1 SSE2 [mysql-sha1 SSE2]... DONE Raw: 4145K c/s john -test=5 -for=mysql-sha1 Benchmarking: MySQL 4.1 double-SHA-1 SSE2 [mysql-sha1 SSE2]... DONE Raw: 4391K c/s john -test=5 -for=mysql-sha1 Benchmarking: MySQL 4.1 double-SHA-1 SSE2 [mysql-sha1 SSE2]... DONE Raw: 4608K c/s The 1st is the 'starting point'. The 2nd is simple change to the memset. The 3rd is a change to the asm creating a new function that does not byteswap prior to returning (and the reduction of the memset) static void mysqlsha1_crypt_all(int count) { #ifdef MMX_COEF unsigned int i; - shammx((unsigned char *) crypt_key, (unsigned char *) saved_key, total_len); - - for(i = 0; i < MMX_COEF*BINARY_SIZE/sizeof(unsigned); i++) - { - ((unsigned*)interm_key)[i] = BYTESWAP(((unsigned*)crypt_key)[i]); - } + shammx_nofinalbyteswap((unsigned char *) crypt_key, (unsigned char *) saved_key, total_len); + for(i = 0; i < MMX_COEF*BINARY_SIZE/sizeof(unsigned); i++) + ((unsigned*)interm_key)[i] = ((unsigned*)crypt_key)[i]; Simply create a new function called shammx_nofinalbyteswap(). That function works just like shammx. However, before jumping into the shammx_noinit call, it sets a dword=1. Then at the bottom of the shammx_noinit, right before it does a jmp to skip_endianity, I check that dword. If it is 1, I simply set that dword back to 0 (so next call to shammx will work properly), and rip the 5 mmx registers into the output buffer. If the dword was 0, the original jmp skip_endianity is done. What that gives, is a x86 friendly dump of sha1 into the buffer, to avoid endianity checks later (converting to base-16, comparing against original base-16, etc), but also gives a shammx function which leaves the buffer in PROPER endianity format for input data. So if there was a function such as phpass, which used sha1, this speedup would allow multiple encryptions right in the same buffer. Phpass is 'like' this mysql-sha1, in the fact that the key is setup, the crypt is done, and then the crypt is done over and over again on the binary results of the buffer. In phpass, there is more than just the binary results (it also appends key, but that binary residue from the prior run, is ALWAYS 16 bytes). In phpass, we have this: md5(binary(md5($p.$s)).$p)^2048 In mysql-sha1 we have sha1(binary(sha1(byteswap.c($p))) After my change, this is what the logic actually does. Before the change we were doing sha1(byteswap.c(byteswap.sse(binary(sha1(byteswap.c($p)))))) NOTE, I probably should have kept some of these optimizations quiet, so that I had some ability to offset the overhead of the generic functions, yet still be able to design subformats that can replace existing ones (like sha1-raw, mysql-sha1, etc), keeping 'almost' equal speeds. Jim.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.