|
Message-ID: <CABob6ipuPc2WuAiW-Ed9j=WjNMuAioU37CxZcpUd6cboyNQ+gg@mail.gmail.com> Date: Tue, 2 Jun 2015 00:18:37 +0200 From: Lukas Odzioba <lukas.odzioba@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Parallel in OpenCL 2015-05-31 13:39 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>: > none@...e ~/Desktop/parallel/run $ ./john --test --format=parallel-opencl > Device 0: GeForce GTX 960M > Many salts: 37236 c/s real, 37236 c/s virtual > > [a@...er run]$ ./john --test --format=parallel-opencl --dev=5 > Device 5: GeForce GTX TITAN > Many salts: 40206 c/s real, 40454 c/s virtual > > GCN without "add 0" optimization > > [a@...er run]$ ./john --test --format=parallel-opencl --dev=1 > Many salts: 45093 c/s real, 4915K c/s virtual > GCN with unrolling one loop > > [a@...er run]$ ./john --test --format=parallel-opencl --dev=1 > Many salts: 27536 c/s real, 3276K c/s virtual On one hand you have great results on mobile gpu, while on the "proper" ones results are similar. On the gcn it looks for me like you have hit a local maximum and without major code reorganization "against the current" it will be hard to jump out out this hole. Code size might be a major limiting factor here, so we can try to simplify current code, or split computations into two separate kernels. Another approach would be moving some of the initialization code on the host side and this way limiting the code size. Optimizing alu operations might not give results because of some other limitation like memory bandwidth or code cache size, here we can have both of them. As I said on irc I would try to simply it even if you don't see results on the first sight. One more question what was the code size of the very first general implementation?
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.