|
Message-ID: <CA+EaD-YP=bjjtxfnEbdF18MwAD7zrixpm1bhr1q_9nqVYvSXGw@mail.gmail.com> Date: Mon, 8 Jul 2013 20:28:33 +0200 From: Katja Malvoni <kmalvoni@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt Hi Yaniv, On Mon, Jul 8, 2013 at 5:38 AM, Yaniv Sapir <yaniv@...pteva.com> wrote: > Katja, > > It is a little bit hard to follow your question - I hope I get it right: > > I changed one thing in the code I sent in previous emails and now it >> works. I did something you recommended not to do - I used e_write and after >> it I used e_load_group(). Now both minimal and full code work. >> > > That's great. > It is and it's not... With this approach I have to load image in every iteration of the loop, I can't implement your suggestion to have server on cores. But I put it after e_writes to local memory. >> > > What is the "it" you refer to? What did you put after the e_write()? > "It" is e_load_group(), so I have e_load_group() after two e_write() calls used to transfer key and hash to core's local memory. > > >> If I put e_load after writing to shared dram than it doesn't work. >> > > Probably b/c you overrun the whatever is in the DRAM with the > initialization values of the external objects that are written by the > e_load(). > Hm... Does that mean that e_load() puts zeroes in shared dram if variable is declared as static? If so, than my whole shared buffer is filled with zeroes. How can than I read garbage from that buffer? On the other hand, if it's before e_wirtes to core's local memory than data >> in local memory isn't correct for some of the cores. >> > > If you launch the program (E_TRUE input to e_load()) before writing the > data, there surely is a situation where you are actually processing garbage > (GIGO).... > > > >> I got it working in one more way. If I start cores using e_start() after >> e_write() (attached code) than it also works. >> > > .... which is why this method works. You load, then write data, then start > program! > > So I created one more really minimal code, it's attached. I load the program and than do only one write to shared memory. And I ran it many times. I have two scenarios - first one is when I use "while(result.core_done[0] == 0)". In that case there is some garbage (different from zero) in result.core_done[0] and data is read from the shared memory immediately. What I read in that case is some garbage for result.core_done array and start array has correct values for all the cores (in this example that's 16). If I use sleep(1) (or any longer or shorter (usleep) interval), I read only zeroes because all cores are in infinite loop - whole start array is filled with zeroes. The only explanation I have for this is that whole start array gets overwritten. But only e_load can overwrite those since I don't do any writes to start array except when host writes 16 to every array location. Cores do not write in that array. I checked all offsets, for the attached code, both cores and host return exactly the same number for every variable in the data structure. On Sun, Jul 7, 2013 at 8:04 PM, Yaniv Sapir <yaniv@...pteva.com> wrote: Katja, [...] ... so I went on and commented out the "outbuf.start[corenum] = 0" line. This is what I get (when using only the first 2 rows - i.e., only 8 cores - b/c I use a faulty chip): eCore 0x808 (0, 0): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc, 74420600 eCore 0x809 (0, 1): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc, 74420600 eCore 0x80a (0, 2): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc, 74420600 eCore 0x80b (0, 3): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc, 74420600 eCore 0x848 (1, 0): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc, 74420600 eCore 0x849 (1, 1): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc, 74420600 eCore 0x84a (1, 2): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc, 74420600 eCore 0x84b (1, 3): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc, 74420600 Execution time - Epiphany: 19.040000 ms done = 8 core_done[ 0] = 1 test[ 0] = 16 ciphertext[0] = P core_done[ 1] = 1 test[ 1] = 16 ciphertext[1] = P core_done[ 2] = 1 test[ 2] = 16 ciphertext[2] = P core_done[ 3] = 1 test[ 3] = 16 ciphertext[3] = P core_done[ 4] = 1 test[ 4] = 16 ciphertext[4] = P core_done[ 5] = 1 test[ 5] = 16 ciphertext[5] = P core_done[ 6] = 1 test[ 6] = 16 ciphertext[6] = P core_done[ 7] = 1 test[ 7] = 16 ciphertext[7] = P does this make sense? I tried this and for me it doesn't work, I don't get correct results. I took code I sent you (http://www.openwall.com/lists/john-dev/2013/07/06/3) and did only that, commented out "outbuf.start[corenum] = 0" and some cores return wrong results. Even worse, cores that return wrong results are different for every run and wrong results are also different. Output that I get is attached. Could you please check that you changed only that? And one more question - can core halt itself and if can, how? Since writing to core's local memory and than starting cores using e_start works, I can have infinite loop in which first instruction is halt and after all writes are done, host starts the cores with e_start. After one iteration core halts itself and gets new data. If this is possible scenario, what would happen if core is halted and data transfers aren't executed completely (by data transfer I mean transfer of result from local memory to shared buffer)? Thank you, Katja Content of type "text/html" skipped View attachment "output.txt" of type "text/plain" (6153 bytes) Download attachment "min_example.zip" of type "application/zip" (2930 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.