Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKGDhHUG8dzXfSxLwFGBo61qhzoawz=uixKEwn_bfiDoY_G7jQ@mail.gmail.com>
Date: Sun, 26 Apr 2015 15:35:02 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] John the Ripper support for PHC finalists

my code is on branch "vectors"

2015-04-26 15:28 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>:
> 2015-04-25 22:28 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>:
>> 2015-04-25 21:20 GMT+02:00 Solar Designer <solar@...nwall.com>:
>>> I think you should use a vector data type instead of the separate
>>> work-items, much like the AVX2 code on CPU uses __m256i and doesn't
>>> rely on the compiler's auto-vectorization.  Granted, auto-vectorization
>>> is much more common with OpenCL than with C, but you can help the
>>> compiler by doing a part of it explicitly anyway (and then it'd be the
>>> compiler's job to combine these narrow SIMD portions into possibly wider
>>> SIMD that the hardware might need).  I think you need ulong4 there:
>>>
>>> https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/vectorDataTypes.html
>>
>> sorry, I didn't tested this. I have noticed now that in function H
>> index_global and index_local are always divisible by 4 so I can use
>> this with coalescing
>
> results with version with vectors:
>
> [a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=1
> Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient,
> development use only)]... Device 1: Tahiti [AMD Radeon HD 7900 Series]
> Local worksize (LWS) 64, global worksize (GWS) 4096
> DONE
> Speed for cost 1 (N) of 2, cost 2 (r) of 2
> Raw:    87487 c/s real, 9011K c/s virtual
>
> [a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=5
> Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient,
> development use only)]... Device 5: GeForce GTX TITAN
> Local worksize (LWS) 64, global worksize (GWS) 4096
> DONE
> Speed for cost 1 (N) of 2, cost 2 (r) of 2
> Raw:    59650 c/s real, 59650 c/s virtual
>
> results with the  previous version:
>
> [a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=5
> Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient,
> development use only)]... Device 5: GeForce GTX TITAN
> Local worksize (LWS) 64, global worksize (GWS) 8192
> DONE
> Speed for cost 1 (N) of 2, cost 2 (r) of 2
> Raw:    82671 c/s real, 82671 c/s virtual
>
> [a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=1
> Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient,
> development use only)]... Device 1: Tahiti [AMD Radeon HD 7900 Series]
> Local worksize (LWS) 64, global worksize (GWS) 2048
> DONE
> Speed for cost 1 (N) of 2, cost 2 (r) of 2
> Raw:    77053 c/s real, 3891K c/s virtual

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.