|
Message-ID: <CABob6ir1CBo7uiczbEDQBaVGQiN7PY7oquuwfBgRBOCbpm1-jg@mail.gmail.com> Date: Sun, 1 Apr 2012 21:07:27 +0200 From: Lukas Odzioba <lukas.odzioba@...il.com> To: john-dev@...ts.openwall.com Subject: Re: fast hashes on GPU I've never used pragma unroll in OpenCL. According to this: http://gpgpu.org/2010/03/20/cuda-3-0-toolkit-released It should be supported for NV. For AMD what i have googled is that people are having trouble with it, so possibly something is broken in amd compiler. http://devgurus.amd.com/thread/158877 I would suggest use of own unrolling macros: like #define U1() something what we want to unroll #define U2() U1() U1() #define U4() \ U2() U2() and so on.. for powers of 2 it is straightforward, for other numbers you have to combine more defines to get unroll number you want. The problem is that we cannot parametrize #define with other #define, but we can redefine U1 when needed (am I right?) This is not a nice solutions, but it works, and it is more readable than hundreds unrolled lines. Lukas
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.