|
Message-ID: <20130907004256.GC8393@openwall.com> Date: Sat, 7 Sep 2013 04:42:57 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: Litecoin mining Rafael, On Fri, Sep 06, 2013 at 05:47:06AM +0100, Rafael Waldo Delgado Doblas wrote: > Today I have implemented a new TMTO. Now the first "NO_TMTO_GROUPS" > elements from V that the (position%TMTO_RATIO) is equal to (TMTO_RATIO - 1) > will be stored, in addition of any element that (position%TMTO_RATIO) is > equal to 0. With this TMTO implementation we can handle tmto ratios between > 5 and 4. I have chosen (TMTO_RATIO - 1) because this elements should be > harder to recompute than (TMTO_RATIO - 2),(TMTO_RATIO - 3)...(2),(1). > > The maximum value for NO_TMTO_GROUPS at the moment is 11. It's very nice that you're experimenting with this and that you finally have a better understanding of how flexible scrypt's TMTO is. Your approach above might be suboptimal, though, and your rationale for choosing (TMTO_RATIO - 1) is flawed. Yes, those elements are harder to recompute, but on the other hand e.g. (TMTO_RATIO - 2) would be beneficial to have twice more often (when you need these exact elements as well as when you need the next elements). So, if you stick with this general approach (which might as well be OK), the exact choice of which additional elements to store (as room permits) is not as trivial, and you may need to do some math (for the expected number of recomputations with the different ones of the possible choices) and/or testing. When I say that the approach "might be suboptimal", I mean that I'd expect alternating 4's and 5's to be more optimal (in terms of the number of recomputations) than all 5's with some of those 5's split in two (by storing a mid-point, like you do). As I had pointed out before, the first one of SMix's two loops, which accesses the elements sequentially, does not even need to use the modulo division - it could instead maintain an extra counter variable. This works great for whole numbers like 5, and it's also easy to implement for specific easy numbers like 4.5 (simply alternate between 4 and 5 for the wraparound point for that variable). When you need to support an arbitrary (compile-time configurable) number between 4 and 5, it becomes trickier - but not much so. What you'd have is analogous to drawing an arbitrary diagonal line in 2D computer graphics - and there are well-known efficient algorithms for doing that, which don't involve integer division. With SMix's second loop, things might be trickier. I am not suggesting that you search for and implement the most optimal solution to this now, though - you've already over-spent time on this little project, and GSoC is coming to an end soon. I merely point these things out for your better understanding. There's not that much difference between ~4.5 and 5 in terms of overall performance, even if the ~4.5 is implemented optimally. In fact, even if we could do TMTO_RATIO = 4, that would be a 1.75x slowdown compared to no TMTO, whereas TMTO_RATIO = 5 is a 2x slowdown. Thus, TMTO_RATIO = 4 is only 12.5% faster, and 4.5 is somewhere inbetween (~6% faster than 5). Also, beware of running onto the stack. You don't get any warning when that happens - you simply get incorrect computation result (maybe not every time, even!) or a segfault (if you're lucky!) Since stack usage may vary from build to build (e.g., with different e-gcc versions and/or options), we should not distribute a version that uses the absolute maximum amount of memory it can, unless the stack problem is somehow dealt with (e.g., a sufficiently exhaustive runtime self-test on every program invocation, making sure that the very last element of V was made use of in SMix's second loop, which does not always happen). A self-test at program start is a good idea anyway. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.