Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151007213744.GA17218@openwall.com>
Date: Thu, 8 Oct 2015 00:37:44 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Cc: Roman Rusakov <rusakovster@...il.com>, deeplearningjohndoe@...il.com
Subject: Re: nVidia Maxwell support (especially descrypt)?

DeepLearningJohnDoe - thank you for your work in this area, and we'd
appreciate any comments you might have on the below.

On Wed, Oct 07, 2015 at 06:54:20PM +0200, magnum wrote:
> >On Wed, Oct 7, 2015 at 8:44 AM, Solar Designer <solar@...nwall.com>  wrote:
> >>And of course we'll also need to include some LOP3.LUT S-boxes.
> >>If Roman's are still unreleased (except for S4), then Janet's.
[...]
> I implemeted this in 9c82bcc, using DeepLearningJohnDoes's (a.k.a 
> Janet's) S-boxes except for s4.

Are you getting better speeds with Roman's S4?

> Boost appears to be in the order of 10% for LM, 20% for DES.

Confirmed, on Titan X against the same 10 descrypt hashes (10 different
salts) as yesterday:

0g 0:00:03:10 2.04% (ETA: 02:51:06) 0g/s 22303Kp/s 226641Kc/s 226641KC/s GPU:67C util:100% fan:26% aacxytna..aacxytna

This is now roughly same speed as Tahiti.  Titan X got to be better than
that.  Maybe that split of the S-box "lookups" across 4 work-items is key
to better performance (more work done per registers consumed).  Sayantan,
please look into that.

I'd run on many more salts to reduce the key setup overhead, but then
the kernel build time becomes large and distorts the reported c/s
figures too much for quick runs like this.  Maybe we need to reset the
timer to zero once the kernels are built, or/and maybe we need to add
computation and reporting of instantaneous speeds (not just the all-time
averages).

> Is there any special place to look for more of Romans's work?

No.  We need to ask Roman.  I sort of just did, by CC'ing him.

BTW, our current opencl_sboxes.h defaults to using nonstd.c derived
expressions when !HAVE_LUT3.  Maybe it should also have an option for
using sboxes-s.c derived expressions, which are supposed to be faster on
AMD GPUs.

> BTW we now also use LOP3.LUT for many MD4, MD5 and SHA-2 OpenCL formats. 
> Some driver bug prevented me for using it in SHA-1 with nvidia 352.39 
> (the code is there, just disabled) and md5crypt disable it because of 
> performance regression (still to be investigated). Some formats show a 
> fine boost but none as much as DEScrypt.

... with our guess on why lower boost being that LOP3.LUT was often
used anyway, introduced in the PTX to ISA translation.

Thank you all for working on this!

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.