|
Message-ID: <20130904050424.GB23413@openwall.com> Date: Wed, 4 Sep 2013 09:04:24 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: Litecoin mining Rafael, On Wed, Sep 04, 2013 at 03:27:05AM +0100, Rafael Waldo Delgado Doblas wrote: > Well it's was running without any crashes for almost 4 hours and if I force > to find a share by ignoring the scrypt hash and making it 0, all looks fine > (of course the share is rejected). However I cannot find any share, can be > it because the work is discarded before Epiphany can find a valid share? I don't know what you mean by "the work is discarded" here. What component do you think discards the work, and why? You should get your modified cgminer to run to a point where the pool does accept a share. > In addition as you asked this is the work that I perform today: Thanks! > I implemented a couple salsa20_8 asm versions: > The first one with bucles "Bor[i] = Bout[i] = (B[i] ^ Bx[i]);" and "Bout[i] > += Bor[i];" rolled and using the instruction imadd, it save about 250B but > the performance drops almost 0.5khash/s > The second keep unrolled the bucles and uses the instruction imadd, it save > only 50B but the performance also drops almost 0.5khash/s. This is fine. > At this point looks like imadd instrucction it so slow to be used but roll > the bucle could be nice. Wrong conclusions. Please re-read: http://www.openwall.com/lists/john-dev/2013/08/29/4 "Slow" is non-informative. IMADD has high latency (you'd call this "slow"), but it also has high throughput (we may call this "fast"). This means that with proper instruction scheduling it can be fast. When you use inline asm for just one rotate operation, you make the IMADD and its latency opaque to gcc. As a result, gcc is not enabled to produce good instruction scheduling. Additionally, your use of just one register for the temporary value does not allow for multiple rotate operations to overlap (mixing their instructions), but clearly with this inline asm approach gcc would not perform this optimization anyway because, once again, the piece of inline asm is opaque to it (as far as gcc is aware, it's just a string, not a piece of code that gcc could possibly inter-mix with another piece of code). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.