|
Message-ID: <20131030180731.GA29621@openwall.com> Date: Wed, 30 Oct 2013 22:07:31 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: ZedBoard: bcrypt On Wed, Oct 30, 2013 at 06:55:10PM +0100, Katja Malvoni wrote: > On Wed, Oct 30, 2013 at 6:42 PM, Solar Designer <solar@...nwall.com> wrote: > > > > So I store each S-box in two BRAM blocks in order to have all 4 values > > > after 2 cycles of delay. > > > > It is unclear what you mean by "each S-box" above. Blowfish uses four > > S-boxes. Do you want to say that you're storing each of the four twice, > > for a total of 8 S-boxes stored (two dupes of each)? If so, I see no > > reason for you to be doing that. In order to have all 4 values after > > just one lookup's latency, you simply need to store two S-boxes in one > > BRAM block and two more in another, with no data duplication. Is this > > what you're actually doing? If so, this makes sense to me, but your > > wording above (and in a previous message) is inconsistent with it. > > I'm using wrong wording - I was calling S[4][0x100] one S-box. So in > correct wording: I am storing 4 S-boxes in one BRAM and than again same 4 > S-boxes in another BRAM which is total of 8 S-boxes. I'll change this to 2 > S-boxes in one BRAM and two in another one. OK. Regarding data transfers: > > OK, this is clear. We could improve upon this approach, but maybe we > > don't need to, if we have BRAM blocks to waste anyway. A concern, > > though, is how many slices we're spending on the logic initializing the > > per-core BRAMs, and whether that can be optimized. We may look into > > that a bit later. > > With only one core utilization is: > Register: 5% > LUT: 15% > Slice: 25% > RAMB36E1: 6% > RAMB18E1: 1% > BUFG: 3% Thanks for this utilization data. Note that there's probably quite some per-core overhead, including in Slice utilization, for the initialization of the per-core BRAMs from the BRAM that you use for data transfer from host. You probably have mux'es in each core, since you're already using all of the per-core BRAMs' ports for computation. ... or are there write ports separate from the read ports that you use for computation? > Two AXI buses and DMA take away some space (I think around 20% of Slice > utilization). I'll try to think about other possible ways of host FPGA > communication. OK, but: I am not concerned about the 25% Slice utilization above as much as I am about how much of the remaining 75% is possibly consumed by the overhead needed to initialize the per-core BRAMs. I wouldn't be surprised if e.g. one third of the remaining 75% is consumed by such overhead. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.