|
Message-ID: <20130907001937.GB8393@openwall.com> Date: Sat, 7 Sep 2013 04:19:37 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: Litecoin mining Rafael, On Fri, Sep 06, 2013 at 04:49:25PM +0100, Rafael Waldo Delgado Doblas wrote: > 2013/9/6 Yaniv Sapir <yaniv@...pteva.com> > > > There is now way to do that, since the instruction op-codes are either > > 32-bit or 16-bit wide, and you have to leave a few bits for the code > > itself... > > Thank you for your answer, now this looks really clear. To add to Yaniv's answer: Although there's no way to fit a 32-bit immediate constant in an instruction, you may quickly load a 32-bit value (including a constant if you need) into a register with one instruction, or even two 32-bit values (including constants if you need) into two registers with one instruction, by using the memory load instructions (LDR, LDRD). The prerequisites for this are that you need to have the value(s) somewhere in local memory, the address needs to be properly aligned (meaning 4- or 8-byte), and if you're loading two values at once, then they need to be adjacent in memory and the target registers need to be adjacent and "aligned" too. Moreover, you need to have a close enough local memory address already loaded into a register (so that the displacement relative to that address will fit in the instruction). BTW, it's the same with other typical RISC archs. 32-bit instruction size is very common, so a 32-bit constant can be loaded with either two instructions (each encoding a portion of the constant) or a memory load instruction (hopefully, accessing an L1 cache if present), and often the latter is faster (e.g., this is why JtR's arch.h sets MD5_IMM to 0 on RISC archs - to use arrays with the constants instead of using immediate values in the code). > Maybe you can tell > me why there is no way to use the registers sequentially in a loop. What do you mean by that? > I checked a couple of disassembled codes and all times that there is a > sequential access to an array using a loop, the generated code has a lot of > load and store instructions but the unrolled version only uses registers. I > only feel curious about this. Obviously, using the registers is faster, because they can be directly encoded in instructions that perform computation. Are you trying to ask why the non-unrolled loops are unable to index registers as if they were arrays? Unfortunately, the Epiphany architecture - like most others - lacks indexed access to registers. The arch reference almost implies that such a capability is present, but as Yaniv pointed out in here before, this is not actually the case: http://www.openwall.com/lists/john-dev/2013/07/25/27 So we live with the usual limitations that most other archs have. > BTW I have run test and finally it's finds a share but unfortunately there > something wrong because it was rejected, at least there wasn't a segfault. > Rejected 00000000 Diff 0/63 EPI 0 (target-miss) "Diff 0/63" means that the share had a difficulty of 0, whereas the required minimum is currently 63. I think this is why it was a "target-miss". The first number in "Diff 0/63" should be at least 63. You need to find out why it's zero in your case, and correct the root cause. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.