|
Message-ID: <CABob6ip8S=wZt9JebC6pOPYvayoOY=47HXWSB9sZzbB8g211yg@mail.gmail.com> Date: Thu, 4 Jun 2015 00:45:08 +0200 From: Lukas Odzioba <lukas.odzioba@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Parallel in OpenCL 2015-06-04 0:30 GMT+02:00 Solar Designer <solar@...nwall.com>: > I am somewhat out of context on your discussion with Agnieszka, so I am > puzzled by the comments (initially by her, and now also by you) of code > size increases somehow being associated with use of split kernels. Sorry, but I also was confused with some of our discussion :) Agnieszka tried to implement optimization that exploits presence of 0 bytes in the sha512 input, which happens in "parallel loop". We can't make such assumptions for all sha512 calls used in function parallel, so implementing slightly different SHA512 with this optimizations (and still we had to have the normal version) increased code size, which what we think reduced performance because code size exceeded L1 code cache on GCN, actual performance after this change dropped from 45k to 28k c/s. She also implemented splitted kernel and it itself also degradated performance (from 28k to 27k c/s). Unfortunatelly I forgot that SHA2 are somewhat resistant to such optimizations because it just removes a handful of additions and now I think that we might want to ommit this optimization especially while we are having problems with code size. There are still some low hanging fruits in the code which should increase performance more than what we were trying to do. I hope that clears things up, Lukas
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.