Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d92b484d64190bde04059c9e66c2b2fb@smtp.hushmail.com>
Date: Wed, 02 May 2012 08:30:59 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: New RAR OpenCL kernel

On 04/29/2012 02:11 AM, magnum wrote:
> I'm currently trying to vectorize the rar format because I reckon it
> might be a good thing even on scalar platforms because I do 4x the work
> for 1x the potential branches and stuff. Does this make sense?

Vectorization now seem to work fine on AMD. On CPU, Intel fails to make
it SSE2. AMD succeeds:

OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 1: AMD FX(tm)-8120 Eight-Core Processor
Compilation log: /tmp/OCLat3GEL.cl(253): warning: unknown attribute
"vec_type_hint"
  __kernel void __attribute__((vec_type_hint(uint4))) SetCryptKeys(

This is odd. Both AMD and nvidia emit this warning - I can't see what I
do wrong. Works fine on Intel SDK (though Intel stupidly fails to
vectorize, this is also odd).

Note: OpenCL device is CPU. A non-OpenCL build may be faster.
Local worksize (LWS) 8, Global worksize 1024, KPC 4096
Benchmarking: RAR3 (4 characters) [OpenCL]... DONE
Raw:    947 c/s real, 118 c/s virtual

I might have to remove that remark about CPU build being faster... this
is on Bull's 8 cores. I suppose this is faster than OMP.

However it fails to even build for nvidia, with weird warnings that
might suggest there are bugs in the nvidia API.

Speed on HD 7970 indicated > 17000 c/s but an ASIC hang stopped be from
further testing.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.