|
Message-ID: <20150908141851.GA14964@openwall.com>
Date: Tue, 8 Sep 2015 17:18:51 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: md5crypt mmxput*()
On Tue, Sep 08, 2015 at 01:17:14PM +0300, Solar Designer wrote:
> Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE
> Raw: 231424 c/s real, 28928 c/s virtual
> I think further speedup is possible by using a switch statement to make
> the shift counts into constants (we have an if anyway, we'll just
> replace it with a switch) like cryptmd5_kernel.cl has.
I cleaned up the code and implemented switch - patch attached.
It turned out to cause a minor performance regression on bull (due to
code size growth maybe?) so I am disabling it for XOP and keep the
performance almost the same as above:
Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE
Raw: 231680 c/s real, 28923 c/s virtual
But it helps a lot on well and super. well, with changes from earlier
today but not the switch yet:
Benchmarking: md5crypt, crypt(3) $1$ [MD5 256/256 AVX2 8x3]... (8xOMP) DONE
Raw: 397824 c/s real, 49790 c/s virtual
with switch:
Benchmarking: md5crypt, crypt(3) $1$ [MD5 256/256 AVX2 8x3]... (8xOMP) DONE
Raw: 425472 c/s real, 53184 c/s virtual
super, default gcc (old), version from a few days ago:
Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Raw: 605184 c/s real, 18912 c/s virtual
with my changes from earlier today:
Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Raw: 619008 c/s real, 19307 c/s virtual
with switch:
Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Raw: 638976 c/s real, 19943 c/s virtual
super's latest gcc (4.9.1 after "scl enable devtoolset-3 bash") with the
latest code (with switch):
Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Raw: 731136 c/s real, 22798 c/s virtual
IIRC, previously it was below 700k.
switch can probably be made beneficial for XOP as well if we reduce code
size elsewhere, but I had no luck with that so far (e.g., simply not
inlining the function causes a bigger performance regression).
Alexander
View attachment "john-md5crypt-bitalign2.diff" of type "text/plain" (4541 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.