john-dev - Re: Parallella: Litecoin mining

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130726231028.GB24959@openwall.com>
Date: Sat, 27 Jul 2013 03:10:28 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: Litecoin mining

Yaniv, Rafael -

On Fri, Jul 26, 2013 at 05:47:36PM -0400, Yaniv Sapir wrote:
> As a starting point, consider using (a) neighbor core(s) local memory to
> store parts of your big array. It is much faster to retrieve data from an
> on-chip memory than the external memory. Obviously, it means that you are
> either not going to use the adjacent core for processing in parallel, or
> you need to write the program in a way that core groups share the same data
> buffer(s) (I am not familiar with the algorithm so I don't know if "V"
> needs to be modified during the calculation).

What you describe is possible, but per my analysis we're going to obtain
better performance by increasing the time-memory tradeoff factor enough
that all cores are in use and each core accesses its own local memory only.

For example, considering memory usage only by V for simplicity (not by
code, stack, B, XY), on a 16-core without TMTO we could use up to 4
cores for computation (4*128 KB uses up our 512 KB).  However, with a
TMTO factor of 4, we'd use all 16 cores, while paying for this privilege
by a less than 2x increase in computation - so we'd achieve more than
twice better cumulative throughput.  Factoring in other memory needs
skews this somewhat (it could be 3 cores for no TMTO, and a higher TMTO
factor for the 16 cores alternative), but overall this comparison holds.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.