|
Message-ID: <20140720011210.GB25512@openwall.com> Date: Sun, 20 Jul 2014 05:12:10 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: ZedBoard: bcrypt Katja, On Sat, Jul 19, 2014 at 07:10:09PM +0200, Katja Malvoni wrote: > On 17 July 2014 00:03, Solar Designer <solar@...nwall.com> wrote: > > > > I'll implement 56 instances with 4 BRAMs per core and see if these will > > > perform as expected. > > > > Yes, please. > > Implemented. 4571 c/s for cost 5, 64.51 c/s for cost 12. Cool! What clock rate? As I noted in http://www.openwall.com/lists/john-dev/2014/04/21/9 "computation might be slightly slower: it's add, xor, add done sequentially on the same cycle" - but I guess this didn't affect your clock rate yet since the clock rate was limited by longest path used during initialization anyway? > While testing with > cost 12, the zed system rebooted. I guess it's overheating since you modded > the board so it shouldn't be voltage drop problem. Yes, as I mentioned to you via private e-mail, in the current plastic box the board overheats when Zynq PL is in full use for more than a couple of minutes. I'll look into adding a fan (perhaps 40mm, like old graphics cards had). > Next step is to make this 8 BRAMs per core and to avoid initial S-box > transfers :-) Please describe the exact layout you intend for 8 BRAMs. Right now, you use 5 BRAMs per core, with 2 bcrypt instances per core, correct? Out of these, 4 BRAMs are half-empty, and 1 BRAM is mostly empty but not empty enough to put the entire initial S-box values in there. If you combine two such cores together (10 BRAMs, 4 instances), you'll have two mostly empty BRAMs per core, and you'll be able to fit the initial S-box values in there - and still be able to proceed further to double the number of instances per core to hide BRAM latencies. Alternatively, you may fit the initial S-box values in the currently unused halves of the 4 S-box BRAMs (then they'll be 3/4 full), while staying at 5 BRAMs and 2 bcrypt instances per core. (Or you may spread the initial values across the 5 BRAMs differently, to also use the 5th BRAM's ports for quicker initialization or/and to have fewer MUXes for the S-box BRAMs if that turns out to be the case for such design.) Then you won't be able to proceed to double the number of instances per core without re-designing the initialization, but on the other hand with smaller cores like this routing delays might be smaller. The performance difference between these may come from initialization time (for low bcrypt cost) and clock rate (for any bcrypt cost). Neither of these is 8 BRAMs/core. So what is your plan? Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.