|
Message-ID: <CAK=p4C7FwUBmZyaN27+_1beGteA82A=sGo9mwaLA4dgec=or9g@mail.gmail.com>
Date: Mon, 22 Aug 2011 12:09:31 -0700
From: David Hulton <0x31337@...il.com>
To: crypt-dev@...ts.openwall.com
Subject: Re: Yuri's Status Report - #14 of 15
You should have much lower latency on the M501 for Writes/Reads but
you'll definitely want to be transferring your data in larger chunks
(instead of calling WriteDevice multiple times, call it once with a
large length). This will take advantage of bus mastering and bursting
on the bus and if you use a FIFO on the other end it should reduce
your latency almost down to 0 if you code it properly so the core is
never waiting for the software to fill the FIFO.
Also, looking at your Manager.v code you have multiple modules
outputting to the same PicoDataOut signal. You should create a
PicoDataOut wire for each core and OR them all together, this might be
causing issues with your build... I've attached a patched Manager.v
that tries to instantiate 6 cores. Also, it seems like a lot of the
logic is probably used up by larger resources that could be shared
(since they are used at different states in the state machine). I
would recommend trying to break it up to use modules that perform the
32-bit ADDs and other more resource intensive operations (could also
make use of a DSP48 block for the 32-bit ADD for example) and then
have the different parts of the state machine that need to perform a
32-bit ADD to use the module instead of the c <= a + b because usually
the tools aren't very good at realizing that all of the ADDs can be
performed using a shared resource. I would also just look into the
possibility of doing a fully or partially unrolled pipeline design for
the LX240 since there's a lot more logic that you can make use of.
-David
On Thu, Aug 18, 2011 at 10:18 PM, Yuri Gonzaga <yuriggc@...il.com> wrote:
>> Does this apply to E-101 only or also to M-501?
>
> I don't know yet. This answer will have to wait for the bitstream
> generation.
>
>>
>> I notice that in your
>> changes to the JtR tree, you call drv->WriteDeviceAbsolute() with sizes
>> larger than 4 bytes. I guess this is untested yet, but you're hoping
>> that it'll work. Correct?
>
> Right. It is untested. My intention is to transfer bigger block of data at
> a time.
>
>>
>> And indeed for decent performance you'll need sizes not merely larger
>> than 4 bytes, but rather you need to send/receive the entire blob of
>> around 4.5 KB in size in one call.
>
> I will have to change the loop verilog construction to receive and send
> everything together.
> Regards,
> Yuri
>
Download attachment "Manager.v" of type "application/octet-stream" (2738 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.