john-dev - Re: Proposed optimizations to pwsafe

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d2e05d00b054d1c1f38f58564e9213f6@smtp.hushmail.com>
Date: Mon, 28 Jan 2013 19:55:25 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Proposed optimizations to pwsafe

Brian,

After your OpenCL patch I get these warnings from pwsafe-opencl:

Build log: <program source>:282:36: warning: signed shift result (0x200000000) requires 35 bits to represent, but 'int' only has 32 bits
                w[14] = sigma1( w[12] ) + w[7] + sigma0( 256 );
                                                 ^~~~~~~~~~~~~
<program source>:21:21: note: expanded from macro 'sigma0'
#define sigma0(x) ((ror(x,7))  ^ (ror(x,18)) ^ (x>>3))
                    ^
<program source>:16:33: note: expanded from macro 'ror'
#define ror(x,n) ((x >> n) | (x << (32-n)))
                              ~ ^  ~
<program source>:615:35: warning: signed shift result (0x200000000) requires 35 bits to represent, but 'int' only has 32 bits
        w[14] = sigma1( w[12] ) + w[7] + sigma0( 256 );
                                         ^~~~~~~~~~~~~
<program source>:21:21: note: expanded from macro 'sigma0'
#define sigma0(x) ((ror(x,7))  ^ (ror(x,18)) ^ (x>>3))
                    ^
<program source>:16:33: note: expanded from macro 'ror'
#define ror(x,n) ((x >> n) | (x << (32-n)))
                              ~ ^  ~


It passes self-test though. Even the Test Suite passes IIRC. So maybe this is harmless? But we should still get rid of the warnings.

Note that in the bleeding branch, compiler warnings are always shown. In unstable, you need to -DREPORT_OPENCL_WARNINGS or -DDEBUG for them to show up (as long as there are only warnings).

magnum



On 28 Jan, 2013, at 2:09 , Brian Wallace <nightstrike9809@...il.com> wrote:

> When I applied the opencl optimization, I only saw minor improvements compared to the CUDA improvements.  I found that was kind of weird, because it was basically the same changes to the code.
> 
> On Sun, Jan 27, 2013 at 7:58 PM, magnum <john.magnum@...hmail.com> wrote:
> On 28 Jan, 2013, at 1:41 , Solar Designer <solar@...nwall.com> wrote:
> > On Sun, Jan 27, 2013 at 07:22:19PM -0500, Brian Wallace wrote:
> >> Ok, I'll do those changes.  I haven't done much cuda/ocl coding in the
> >> past, so it might take me a short while to get up to speed on what works
> >> best, although I have a good background in C and hash cracking
> >> optimization.  What kind of benchmarks are we getting on pwsafe-opencl vs
> >> hashcat.
> >
> > Apparently, hashcat's speed is ~500k on HD 7970.  hashkill is at ~480k:
> >
> > http://twitter.com/gat3way/status/294968226209726464/photo/1
> >
> > We're getting 355k:
> >
> 
> > (The match of OpenCL and CUDA speed is curious.  I did not tune THREADS
> > and BLOCKS in cuda_pwsafe.h, and was compiling for the default of sm_10.
> > Perhaps better speed is possible with some tuning.)
> 
> The OpenCL format currently only auto-tunes local work-size (THREADS) so it too runs at suboptimal conditions. The global work-size defauls to the same figure the CUDA format use. It does support LWS/GWS environment variables though:
> 
> $ GWS=$((256*1024)) ../run/john -t -fo:pwsafe-opencl -plat=1
> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
> Device 0: Tahiti (AMD Radeon HD 7900 Series)
> Local worksize (LWS) 64, Global worksize (GWS) 262144
> Benchmarking: Password Safe SHA-256 [OpenCL]... DONE
> Raw:    362411 c/s real, 78643K c/s virtual
> 
> No huge difference though.
> 
> In bleeding, Claudio has added a shared function for tuning GWS. I haven't had time to try it out yet.
> 
> magnum
> 


Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.