|
Message-ID: <CAJpaVcT3OfRkK_Og8C-1fiQpx_Hkt6Xe7n3LzaqJ28Nd0feZ5w@mail.gmail.com>
Date: Mon, 28 Jan 2013 16:26:04 -0500
From: Brian Wallace <nightstrike9809@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Proposed optimizations to pwsafe
I'm going to try and replace ror with rotate calls, but it seems to require
some type conversions. I'm doing a bit of reading up on OpenCL dev to fix
any issues and hopefully get more c/s.
On Mon, Jan 28, 2013 at 1:55 PM, magnum <john.magnum@...hmail.com> wrote:
> Brian,
>
> After your OpenCL patch I get these warnings from pwsafe-opencl:
>
> Build log: <program source>:282:36: warning: signed shift result
> (0x200000000) requires 35 bits to represent, but 'int' only has 32 bits
> w[14] = sigma1( w[12] ) + w[7] + sigma0( 256 );
> ^~~~~~~~~~~~~
> <program source>:21:21: note: expanded from macro 'sigma0'
> #define sigma0(x) ((ror(x,7)) ^ (ror(x,18)) ^ (x>>3))
> ^
> <program source>:16:33: note: expanded from macro 'ror'
> #define ror(x,n) ((x >> n) | (x << (32-n)))
> ~ ^ ~
> <program source>:615:35: warning: signed shift result (0x200000000)
> requires 35 bits to represent, but 'int' only has 32 bits
> w[14] = sigma1( w[12] ) + w[7] + sigma0( 256 );
> ^~~~~~~~~~~~~
> <program source>:21:21: note: expanded from macro 'sigma0'
> #define sigma0(x) ((ror(x,7)) ^ (ror(x,18)) ^ (x>>3))
> ^
> <program source>:16:33: note: expanded from macro 'ror'
> #define ror(x,n) ((x >> n) | (x << (32-n)))
> ~ ^ ~
>
>
> It passes self-test though. Even the Test Suite passes IIRC. So maybe this
> is harmless? But we should still get rid of the warnings.
>
> Note that in the bleeding branch, compiler warnings are always shown. In
> unstable, you need to -DREPORT_OPENCL_WARNINGS or -DDEBUG for them to show
> up (as long as there are only warnings).
>
> magnum
>
>
>
> On 28 Jan, 2013, at 2:09 , Brian Wallace <nightstrike9809@...il.com>
> wrote:
>
> When I applied the opencl optimization, I only saw minor improvements
> compared to the CUDA improvements. I found that was kind of weird, because
> it was basically the same changes to the code.
>
> On Sun, Jan 27, 2013 at 7:58 PM, magnum <john.magnum@...hmail.com> wrote:
>
>> On 28 Jan, 2013, at 1:41 , Solar Designer <solar@...nwall.com> wrote:
>> > On Sun, Jan 27, 2013 at 07:22:19PM -0500, Brian Wallace wrote:
>> >> Ok, I'll do those changes. I haven't done much cuda/ocl coding in the
>> >> past, so it might take me a short while to get up to speed on what
>> works
>> >> best, although I have a good background in C and hash cracking
>> >> optimization. What kind of benchmarks are we getting on pwsafe-opencl
>> vs
>> >> hashcat.
>> >
>> > Apparently, hashcat's speed is ~500k on HD 7970. hashkill is at ~480k:
>> >
>> > http://twitter.com/gat3way/status/294968226209726464/photo/1
>> >
>> > We're getting 355k:
>> >
>>
>> > (The match of OpenCL and CUDA speed is curious. I did not tune THREADS
>> > and BLOCKS in cuda_pwsafe.h, and was compiling for the default of sm_10.
>> > Perhaps better speed is possible with some tuning.)
>>
>> The OpenCL format currently only auto-tunes local work-size (THREADS) so
>> it too runs at suboptimal conditions. The global work-size defauls to the
>> same figure the CUDA format use. It does support LWS/GWS environment
>> variables though:
>>
>> $ GWS=$((256*1024)) ../run/john -t -fo:pwsafe-opencl -plat=1
>> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
>> Device 0: Tahiti (AMD Radeon HD 7900 Series)
>> Local worksize (LWS) 64, Global worksize (GWS) 262144
>> Benchmarking: Password Safe SHA-256 [OpenCL]... DONE
>> Raw: 362411 c/s real, 78643K c/s virtual
>>
>> No huge difference though.
>>
>> In bleeding, Claudio has added a shared function for tuning GWS. I
>> haven't had time to try it out yet.
>>
>> magnum
>>
>
>
>
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.