Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130726231028.GB24959@openwall.com>
Date: Sat, 27 Jul 2013 03:10:28 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: Litecoin mining

Yaniv, Rafael -

On Fri, Jul 26, 2013 at 05:47:36PM -0400, Yaniv Sapir wrote:
> As a starting point, consider using (a) neighbor core(s) local memory to
> store parts of your big array. It is much faster to retrieve data from an
> on-chip memory than the external memory. Obviously, it means that you are
> either not going to use the adjacent core for processing in parallel, or
> you need to write the program in a way that core groups share the same data
> buffer(s) (I am not familiar with the algorithm so I don't know if "V"
> needs to be modified during the calculation).

What you describe is possible, but per my analysis we're going to obtain
better performance by increasing the time-memory tradeoff factor enough
that all cores are in use and each core accesses its own local memory only.

For example, considering memory usage only by V for simplicity (not by
code, stack, B, XY), on a 16-core without TMTO we could use up to 4
cores for computation (4*128 KB uses up our 512 KB).  However, with a
TMTO factor of 4, we'd use all 16 cores, while paying for this privilege
by a less than 2x increase in computation - so we'd achieve more than
twice better cumulative throughput.  Factoring in other memory needs
skews this somewhat (it could be 3 cores for no TMTO, and a higher TMTO
factor for the 16 cores alternative), but overall this comparison holds.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.