|
Message-ID: <CA+EaD-Yqh4pvpd2djKGJuCCtX0VTbLdZPpF_EG5BocycYu+0LA@mail.gmail.com>
Date: Sun, 6 Jul 2014 13:20:20 +0200
From: Katja Malvoni <kmalvoni@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: ZedBoard: bcrypt
ᐧ
On 6 July 2014 12:55, Solar Designer <solar@...nwall.com> wrote:
> On Sun, Jul 06, 2014 at 11:22:48AM +0200, Katja Malvoni wrote:
> > On 6 July 2014 10:15, Solar Designer <solar@...nwall.com> wrote:
> >
> > > I guess you're computing 64 bits per hash only, correct? This is
> > > sufficiently unlikely to cause false positives that we can go with it.
> >
> > That's correct. But I transfer 64 32-bit values per hash from FPGA after
> > computation is done because array being transferred contains structure
> > aligned in such way that higher address bits select bcrypt core so
> > everything is done with one call to memcpy(). I tried avoiding
> unnecessary
> > transfers but performance is a bit lower 3744 c/s, I assume because of
> > overhead of multiple calls to memcpy().
>
> Maybe you can use pairs of 32-bit integer or individual 64-bit integer
> reads in place of multiple memcpy()'s.
>
I'm not sure I understand. I'm using mmaped memory space to access bcrypt
logic so if I'm not mistaken, the only way I can read data from that space
is by copying it using memcpy(). Or there is another way to perform those
reads?
In other words, the drop from 960 mV to around 890 mV corresponds to
> unreliable cracking, and you don't know what the voltage is when the
> cracking is reliable (which it is on my ZedBoard only), right?
>
On Parallella board, it is 960 mV (tried with lower core count which is
reliable).
> Perhaps you can achieve a higher clock rate by introducing an extra
> cycle of latency per data transfer during initialization and maybe
> during transfer of hashes to host as well? Anyway, maybe it's better
> to consider that after you've implemented initialization within each
> core as I suggested. It depends on which of these things you feel
> you'll have ready before your trip to the US.
>
I'm not sure that would help. Routing delay is 90.4% of longest path delay
and I can't use any frequency, just ones that can be derived from PS. So
the next one after 71.4 MHz is 76.9 MHz. With 90% of delay being routing I
don't think it is possible to improve logic to achieve 76.9 MHz. All these
wires are connected to the same AXI bus and distributed along entire FPGA
since AXI bus must access BRAMs and every bcrypt instance must access the
same BRAMs. In this case, that extra registers need to be on the BRAM
inputs and outputs which directly impacts bcrypt computation, namely delays
when loading data from S-boxes.
Katja
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.