|
Message-ID: <20130824233606.GA10265@openwall.com> Date: Sun, 25 Aug 2013 03:36:06 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: Litecoin mining Rafael, On Fri, Aug 23, 2013 at 02:31:46AM +0100, Rafael Waldo Delgado Doblas wrote: > // This function approximation works fine up to a = 32771 > #define DIVTMTO(a) ((10923 * (a))>>16) // If TMTO_RATIO changes you need > redefine this macro Good. Note that you don't have to perform the division, not even in this optimized fashion, in the first one of SMix's two loops, because it accesses the elements sequentially. You may instead introduce an extra loop counter variable, which you'd reset back to zero when it hits TMTO_RATIO. That said, on Epiphany the multiplication might be free, because IMUL is an FPU instruction, and the FPU is idle most of the time. So it is unclear which approach to handling this in the first loop is faster. > #define DIV2(a) ((a)>>1) > #define MOD2(a) ((a) - (DIV2(a) << 1)) // This can be optimised in ASM > using carry > > #define DIV8(a) ((a)>>3) > #define MOD8(a) ((a) - (DIV8(a) << 3)) // This can be optimised in ASM > using carry These are ridiculous. MOD2 is simply "& 1", and MOD8 is "& 7". I'm sure the compiler already performed those optimizations for "% 8", although I don't mind being explicit with "& 7". > The performance still the same but now I drop almost 1K. Good. > I'm going to check the segfault. Any luck? How are you debugging it? On Sat, Aug 24, 2013 at 01:33:12AM +0100, Rafael Waldo Delgado Doblas wrote: > It means the core memory is used only up to the address 0000167F. That > means that I have around 27KB free. I guess that I can run TMTO 5 now or at > least I'm close. I took a look at your committed code - it tries to use TMTO 5, but it just gets stuck somewhere. So I've just spent an hour playing around with it, optimizing its memory usage. Please see the attached patch. With this patch, the code + read-only data size is reduced by about 1700 bytes, and it pretends to work, but when I enable the debugging output in driver-epiphany.c, the hashes computed on ARM and Epiphany don't match. Moreover, they don't match even if I reduce TMTO to 6 (and adjust DIVTMTO accordingly). My guess is that you had introduced some bug, so I am leaving it up to you to debug it. ;-) It is, of course, also possible that the bug is in my patch. Please note that when there's little memory free, the stack might be overwriting other data. This is why I tried TMTO 6 (but it didn't help). I suggest that you debug this at TMTO 6 (or higher) initially, and only when you get that working, proceed to set TMTO 5 (now that the reduced code size permits for that). BTW, it'd be nice if you introduce a way to easily enable/disable the debugging output in driver-epiphany.c, e.g. via #define DEBUG_EPI. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.