|
Message-ID: <20130627141234.GA21449@openwall.com> Date: Thu, 27 Jun 2013 18:12:34 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: scrypt Rafael, On Tue, Jun 25, 2013 at 01:00:00PM +0100, Rafael Waldo Delgado Doblas wrote: > I just finished the implementation of Scrypt format. Now I have some > questions: > > The SSE implementation have a really poor performance on my i5-3570K only > 30 c/s What are you comparing it against to state that this performance is "poor"? If I understand what test vector you're using correctly, this one had been tuned (by scrypt author) for 100 ms on one core in a Core 2 Duo. This means 10 c/s per core. However, since scrypt also uses RAM (at these settings), its performance does not scale linearly on multi-core systems. On the other hand, interleaving instructions from multiple independent Salsa20 core computations may improve performance (an attack-specific optimization, which we don't have implemented yet). I took a look at lordrafa_scrypt_fmt.c. Although you based it on c3_fmt.c, which included OpenMP support, you dropped that in your revision of the code (yet you forgot to drop the FMT_OMP flag, setting which without actually having OpenMP support is a minor bug). So your 30 c/s is probably for one core, which is 3x better than the performance achieved by scrypt author's SSE2 code on one core in his Core 2 Duo from a few years ago. That's fine performance - but indeed we should try to improve it further. Another detail I noticed - somehow you require HAVE_SCRYPT, but it's not needed because this very source tree provides scrypt code. (We did need HAVE_CRYPT, because crypt(3) is provided by the system and may not be available on some systems.) For now, to test your code speed on FX-8120, which we have in our dev box (bull), I added -DHAVE_SCRYPT to the linux-x86-64-xop target in my copy of your tree. I also lowered MIN_KEYS_PER_CRYPT and MAX_KEYS_PER_CRYPT from 96 (inherited from c3_fmt.c) to 1, for faster benchmarking, better interactivity when cracking, and less work lost when interrupting/restoring. (Higher values are needed are needed for faster hashes and for OpenMP or similar.) After these changes, I got 28 c/s, which is fine. > Also I need to know the if this are the targets for my gsoc project: Not quite. We'll need to work on a project plan for you. > For the first half of the summer: > First, implement scrypt in host mode. OK. You did it as a hack, just like I had suggested. Now it feels like I need to reimplement it cleanly. %-) > Second, experiment with scrypt time-memory tradeoff. Yes. > Third, move it to epiphany. I had suggested this, but upon a second thought I am not sure. For practically relevant scrypt settings, except as used in crypto currencies, performance on Epiphany will be extremely poor. So maybe we should skip this and jump to Litecoin mining only. Perhaps you can have Litecoin mining working in two ways (and in two source trees) by GSoC midterm: 1. Integrated into jumbo tree, on host CPU (usually x86) and/or GPU and/or Xeon Phi. 2. Running on Epiphany as proof-of-concept - this can be implemented perhaps as a patch to cgminer code base. We might not even need to have JtR involved in this, although having it implemented on top of JtR core tree is an option too (especially if/since JtR will also have bcrypt on Epiphany due to Katja's work). > For the second half of the summer(early August or maybe before?) : > Fourth, implement epiphany based Litecoin mining. I suggest that you approach this in the first half of summer. > Fifth, implement descrypt on Parallella. Maybe, or/and (if time permits) you could work on implementing scrypt and Litecoin mining on Xeon Phi (for both JtR and upstream cgminer project - we'd submit a patch then). Of course, this depends on us actually getting Xeon Phi to work - but this is planned. > Furthermore I need to know the next task, yesterday you told me about > interleaving and opencl, may you tell me more about those tasks? I've just CC'ed you on a message related to JtR/OpenCL on Epiphany. This is also something you can work on during the summer, although I think it should be among the stretch goals or tasks that you work on when you would otherwise be idle (such as when waiting for guidance on your other tasks) - not part of your GSoC project description. As to interleaving, we'll definitely need to try it on x86 CPUs, but probably not on Epiphany (we'd need to use even higher TMTO factors then, which would neutralize any advantage from hiding instruction latencies via interleaving, and besides Epiphany's latencies are probably low enough as-is). > is it time-memory tradeoff related? Not directly. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.