|
Message-ID: <20140414123606.GA27566@openwall.com> Date: Mon, 14 Apr 2014 16:36:06 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: ZedBoard: bcrypt On Mon, Apr 14, 2014 at 03:53:50PM +0400, Solar Designer wrote: > I think it might make sense to interleave multiple instances of bcrypt > per core until you're making full use of all BRAM ports for computation. > > With 4 bcrypt instances per core, you need 20 reads per round. With 2 > cycles/round, that's 10 reads per cycle, needing 5 BRAMs. Maybe you can > have: > > Cycle 0: > initiate S0, S1 lookups for instances 0, 1 (total: 4 lookups) > initiate S2, S3 lookups for instances 2, 3 (total: 4 lookups) > initiate P lookups for instances 0, 1 (total: 2 lookups) > (total: 10 lookups) > Cycle 1: > initiate S2, S3 lookups for instances 0, 1 (total: 4 lookups) > initiate S0, S1 lookups for instances 2, 3 (total: 4 lookups) > initiate P lookups for instances 2, 3 (total: 2 lookups) > (total: 10 lookups) > > with the computation also spread across the two cycles as appropriate > (and maybe you can reuse the same 32-bit adders across bcrypt instances, > although the cost of extra MUXes is likely to kill the advantage). > > Expanding this to 3 cycles/round and 6 instances/core also makes sense, > to allow for higher clock rate: not requiring the data to be available > on the next clock cycle, but only 1 cycle later. I recall reading that > Xilinx BRAMs support output registers for that. It appears that with this extra cycle of latency, you can do either 3 instances/core and have 4 BRAMs per core (so 4 BRAMs per 3 instances) with one port free for initialization use (for the BRAM holding P and misc. data only), or 6 instances/core and have 7 BRAMs per core (so 7 BRAMs per 6 instances) with no free ports. Either way, it is unclear if the extra cycle of latency will allow for a clock rate increase by more than 50% (as needed to compensate for and even benefit from this extra latency) or not. > It'd be fine to proceed with these additional optimizations after moving > to ztex. (Perhaps the optimizations can then be backported to the Zynq > on ZedBoard platform, just to have "final" speed figures for it.) Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.