|
Message-ID: <20130726231028.GB24959@openwall.com> Date: Sat, 27 Jul 2013 03:10:28 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: Litecoin mining Yaniv, Rafael - On Fri, Jul 26, 2013 at 05:47:36PM -0400, Yaniv Sapir wrote: > As a starting point, consider using (a) neighbor core(s) local memory to > store parts of your big array. It is much faster to retrieve data from an > on-chip memory than the external memory. Obviously, it means that you are > either not going to use the adjacent core for processing in parallel, or > you need to write the program in a way that core groups share the same data > buffer(s) (I am not familiar with the algorithm so I don't know if "V" > needs to be modified during the calculation). What you describe is possible, but per my analysis we're going to obtain better performance by increasing the time-memory tradeoff factor enough that all cores are in use and each core accesses its own local memory only. For example, considering memory usage only by V for simplicity (not by code, stack, B, XY), on a 16-core without TMTO we could use up to 4 cores for computation (4*128 KB uses up our 512 KB). However, with a TMTO factor of 4, we'd use all 16 cores, while paying for this privilege by a less than 2x increase in computation - so we'd achieve more than twice better cumulative throughput. Factoring in other memory needs skews this somewhat (it could be 3 cores for no TMTO, and a higher TMTO factor for the 16 cores alternative), but overall this comparison holds. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.