|
Message-ID: <20150905060912.GA25299@openwall.com>
Date: Sat, 5 Sep 2015 09:09:12 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: MD4 G()
magnum, Sayantan -
MD4 G() is the same as SHA-2 Maj(), yet we've been using unoptimized
expression for it so far.
The attached patch improves the speed for pbkdf2-hmac-md4-opencl on
Tahiti from:
Local worksize (LWS) 64, global worksize (GWS) 524288
DONE
Speed for cost 1 (iterations) of 1000
Raw: 3994K c/s real, 104857K c/s virtual
to:
Local worksize (LWS) 64, global worksize (GWS) 524288
DONE
Speed for cost 1 (iterations) of 1000
Raw: 4537K c/s real, 94371K c/s virtual
or if I let it auto-tune to higher GWS (which it previously would not):
Local worksize (LWS) 64, global worksize (GWS) 2097152
DONE
Speed for cost 1 (iterations) of 1000
Raw: 4592K c/s real, 125829K c/s virtual
On one core in FX-8120, I got improvement (with the previously posted
patch) from:
Benchmarking: Raw-MD4 [MD4 128/128 XOP 4x2]... DONE
Raw: 36863K c/s real, 36863K c/s virtual
to:
Benchmarking: Raw-MD4 [MD4 128/128 XOP 4x2]... DONE
Raw: 39233K c/s real, 39233K c/s virtual
although some of the speedup, namely to:
Benchmarking: Raw-MD4 [MD4 128/128 XOP 4x2]... DONE
Raw: 37509K c/s real, 37509K c/s virtual
came from enabling use of H2, which was previously disabled for 2x
interleaving. The new speed of 39233K is finally better than raw-md5's,
which is at most (over several benchmark invocations):
Benchmarking: Raw-MD5 [MD5 128/128 XOP 4x2]... DONE
Raw: 37918K c/s real, 37918K c/s virtual
Yet the difference is surprisingly small, suggesting that there's
still room for speeding up our MD4 on CPU.
It may be worth experimenting with different orderings of x, y, z to
G(). Maybe some of the 6 will result in lower optimal GWS or/and better
performance than others. (The same applies to SHA-1 and SHA-2.)
nt_kernel.cl and mscash_kernel.cl (any others?) will need separate
patches. mscash_kernel.cl doesn't even use bitselect() for F(), and
doesn't use rotate(). They should be made to use opencl_md4.h macros.
Alexander
View attachment "john-opencl-md4g.diff" of type "text/plain" (817 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.