|
Message-ID: <20110818082554.GB31881@openwall.com> Date: Thu, 18 Aug 2011 12:25:54 +0400 From: Solar Designer <solar@...nwall.com> To: crypt-dev@...ts.openwall.com Subject: Re: Yuri's Status Report - #14 of 15 Yuri, David - On Thu, Aug 18, 2011 at 02:46:10AM -0300, Yuri Gonzaga wrote: > I gave some tries of calling WriteDevice() passing block of data. It isn't > working. It is causing the return of wrong result. > Maybe I don't know how to call that function properly or there is any > problem with byte ordering. > In fact, it works passing 4 bytes a time and greater blocks apparently not. Does this apply to E-101 only or also to M-501? I notice that in your changes to the JtR tree, you call drv->WriteDeviceAbsolute() with sizes larger than 4 bytes. I guess this is untested yet, but you're hoping that it'll work. Correct? And indeed for decent performance you'll need sizes not merely larger than 4 bytes, but rather you need to send/receive the entire blob of around 4.5 KB in size in one call. > > > With cost = 18, and 4 cores vs. 4 sequential invocations, I got: > > > > > > - Sequential total time: ~ 33 minutes > > > - Parallel total time: ~ 9 minutes > > These numbers looked reasonable to me at first, but then I did some math > > and they don't agree with the 0.06 seconds figure for cost=5 that you > > gave above. Specifically: > > 33 * 60 / (2 ^ (18 - 5)) = 0.24 > > I expected to see something close to 0.06. Why is it 4 times slower > > here? The difference between sequential and parallel times suggests > > that the reads/writes overhead is indeed pretty low at cost=18, so this > > overhead does not explain the 0.06 vs. 0.24 discrepancy. > > Do you have an explanation? > > Could you please explain better your math? Oh, I missed an important detail that explains it all: "4 sequential invocations". Somehow I lost this "4", treating 33 minutes as time for one invocation at cost=18. This explains the 0.24 vs. 0.06 difference. > > What specific error message does it give when you try to fit 5 cores? > > The first error is: > > "ERROR:Place:543 - This design does not fit into the number of slices > available > in this device due to the complexity of the design and/or constraints." David - any comments on this? Per the synthesis report for 4 cores, it appears to me that we'd have sufficient resources for 6 cores. I am referring to 4-eksblowfish-loop-cores-pico-e101.zip available here: http://openwall.info/wiki/crypt-dev/files and specifically to 4-eksblowfish-loop-cores-pico-e101.pdf inside that .zip archive. It says we're using 65% of the slices with 4 cores. Yet adding a fifth core fails. Is this because of some routing constraints? Looks like another thing to consider while optimizing each core, then. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.