Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110818082554.GB31881@openwall.com>
Date: Thu, 18 Aug 2011 12:25:54 +0400
From: Solar Designer <solar@...nwall.com>
To: crypt-dev@...ts.openwall.com
Subject: Re: Yuri's Status Report - #14 of 15

Yuri, David -

On Thu, Aug 18, 2011 at 02:46:10AM -0300, Yuri Gonzaga wrote:
> I gave some tries of calling WriteDevice() passing block of data. It isn't
> working. It is causing the return of wrong result.
> Maybe I don't know how to call that function properly or there is any
> problem with byte ordering.
> In fact, it works passing 4 bytes a time and greater blocks apparently not.

Does this apply to E-101 only or also to M-501?  I notice that in your
changes to the JtR tree, you call drv->WriteDeviceAbsolute() with sizes
larger than 4 bytes.  I guess this is untested yet, but you're hoping
that it'll work.  Correct?

And indeed for decent performance you'll need sizes not merely larger
than 4 bytes, but rather you need to send/receive the entire blob of
around 4.5 KB in size in one call.

> > > With cost = 18, and 4 cores vs. 4 sequential invocations, I got:
> > >
> > >    - Sequential total time: ~ 33 minutes
> > >    - Parallel total time: ~ 9 minutes
> > These numbers looked reasonable to me at first, but then I did some math
> > and they don't agree with the 0.06 seconds figure for cost=5 that you
> > gave above.  Specifically:
> > 33 * 60 / (2 ^ (18 - 5)) = 0.24
> > I expected to see something close to 0.06.  Why is it 4 times slower
> > here?  The difference between sequential and parallel times suggests
> > that the reads/writes overhead is indeed pretty low at cost=18, so this
> > overhead does not explain the 0.06 vs. 0.24 discrepancy.
> > Do you have an explanation?
> 
> Could you please explain better your math?

Oh, I missed an important detail that explains it all: "4 sequential
invocations".  Somehow I lost this "4", treating 33 minutes as time for
one invocation at cost=18.  This explains the 0.24 vs. 0.06 difference.

> > What specific error message does it give when you try to fit 5 cores?
> 
>  The first error is:
> 
> "ERROR:Place:543 - This design does not fit into the number of slices
> available
>    in this device due to the complexity of the design and/or constraints."

David - any comments on this?  Per the synthesis report for 4 cores, it
appears to me that we'd have sufficient resources for 6 cores.  I am
referring to 4-eksblowfish-loop-cores-pico-e101.zip available here:

http://openwall.info/wiki/crypt-dev/files

and specifically to 4-eksblowfish-loop-cores-pico-e101.pdf inside that
.zip archive.  It says we're using 65% of the slices with 4 cores.  Yet
adding a fifth core fails.  Is this because of some routing constraints?

Looks like another thing to consider while optimizing each core, then.

Thanks,

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.