crypt-dev - Re: Yuri's Status Report

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110818082554.GB31881@openwall.com>
Date: Thu, 18 Aug 2011 12:25:54 +0400
From: Solar Designer <solar@...nwall.com>
To: crypt-dev@...ts.openwall.com
Subject: Re: Yuri's Status Report - #14 of 15

Yuri, David -

On Thu, Aug 18, 2011 at 02:46:10AM -0300, Yuri Gonzaga wrote:
> I gave some tries of calling WriteDevice() passing block of data. It isn't
> working. It is causing the return of wrong result.
> Maybe I don't know how to call that function properly or there is any
> problem with byte ordering.
> In fact, it works passing 4 bytes a time and greater blocks apparently not.

Does this apply to E-101 only or also to M-501?  I notice that in your
changes to the JtR tree, you call drv->WriteDeviceAbsolute() with sizes
larger than 4 bytes.  I guess this is untested yet, but you're hoping
that it'll work.  Correct?

And indeed for decent performance you'll need sizes not merely larger
than 4 bytes, but rather you need to send/receive the entire blob of
around 4.5 KB in size in one call.

> > > With cost = 18, and 4 cores vs. 4 sequential invocations, I got:
> > >
> > >    - Sequential total time: ~ 33 minutes
> > >    - Parallel total time: ~ 9 minutes
> > These numbers looked reasonable to me at first, but then I did some math
> > and they don't agree with the 0.06 seconds figure for cost=5 that you
> > gave above.  Specifically:
> > 33 * 60 / (2 ^ (18 - 5)) = 0.24
> > I expected to see something close to 0.06.  Why is it 4 times slower
> > here?  The difference between sequential and parallel times suggests
> > that the reads/writes overhead is indeed pretty low at cost=18, so this
> > overhead does not explain the 0.06 vs. 0.24 discrepancy.
> > Do you have an explanation?
> 
> Could you please explain better your math?

Oh, I missed an important detail that explains it all: "4 sequential
invocations".  Somehow I lost this "4", treating 33 minutes as time for
one invocation at cost=18.  This explains the 0.24 vs. 0.06 difference.

> > What specific error message does it give when you try to fit 5 cores?
> 
>  The first error is:
> 
> "ERROR:Place:543 - This design does not fit into the number of slices
> available
>    in this device due to the complexity of the design and/or constraints."

David - any comments on this?  Per the synthesis report for 4 cores, it
appears to me that we'd have sufficient resources for 6 cores.  I am
referring to 4-eksblowfish-loop-cores-pico-e101.zip available here:

http://openwall.info/wiki/crypt-dev/files

and specifically to 4-eksblowfish-loop-cores-pico-e101.pdf inside that
.zip archive.  It says we're using 65% of the slices with 4 cores.  Yet
adding a fifth core fails.  Is this because of some routing constraints?

Looks like another thing to consider while optimizing each core, then.

Thanks,

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.