john-dev - MD4 G()

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20150905060912.GA25299@openwall.com>
Date: Sat, 5 Sep 2015 09:09:12 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: MD4 G()

magnum, Sayantan -

MD4 G() is the same as SHA-2 Maj(), yet we've been using unoptimized
expression for it so far.

The attached patch improves the speed for pbkdf2-hmac-md4-opencl on
Tahiti from:

Local worksize (LWS) 64, global worksize (GWS) 524288
DONE
Speed for cost 1 (iterations) of 1000
Raw:    3994K c/s real, 104857K c/s virtual

to:

Local worksize (LWS) 64, global worksize (GWS) 524288
DONE
Speed for cost 1 (iterations) of 1000
Raw:    4537K c/s real, 94371K c/s virtual

or if I let it auto-tune to higher GWS (which it previously would not):

Local worksize (LWS) 64, global worksize (GWS) 2097152
DONE
Speed for cost 1 (iterations) of 1000
Raw:    4592K c/s real, 125829K c/s virtual

On one core in FX-8120, I got improvement (with the previously posted
patch) from:

Benchmarking: Raw-MD4 [MD4 128/128 XOP 4x2]... DONE
Raw:    36863K c/s real, 36863K c/s virtual

to:

Benchmarking: Raw-MD4 [MD4 128/128 XOP 4x2]... DONE
Raw:    39233K c/s real, 39233K c/s virtual

although some of the speedup, namely to:

Benchmarking: Raw-MD4 [MD4 128/128 XOP 4x2]... DONE
Raw:    37509K c/s real, 37509K c/s virtual

came from enabling use of H2, which was previously disabled for 2x
interleaving.  The new speed of 39233K is finally better than raw-md5's,
which is at most (over several benchmark invocations):

Benchmarking: Raw-MD5 [MD5 128/128 XOP 4x2]... DONE
Raw:    37918K c/s real, 37918K c/s virtual

Yet the difference is surprisingly small, suggesting that there's
still room for speeding up our MD4 on CPU.

It may be worth experimenting with different orderings of x, y, z to
G().  Maybe some of the 6 will result in lower optimal GWS or/and better
performance than others.  (The same applies to SHA-1 and SHA-2.)

nt_kernel.cl and mscash_kernel.cl (any others?) will need separate
patches.  mscash_kernel.cl doesn't even use bitselect() for F(), and
doesn't use rotate().  They should be made to use opencl_md4.h macros.

Alexander

View attachment "john-opencl-md4g.diff" of type "text/plain" (817 bytes)

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.