|
Message-ID: <1394251181.33294.YahooMailNeo@web140304.mail.bf1.yahoo.com> Date: Fri, 7 Mar 2014 19:59:41 -0800 (PST) From: Anthony Tanoury <tanoury@...oo.com> To: "john-users@...ts.openwall.com" <john-users@...ts.openwall.com> Subject: Re: Help - mpi ocl restore session Thanks for the response magnum! On 14-03-07 02:31 AM, magnum wrote: On 2014-03-07 07:26, Anthony Tanoury wrote: > >I use John the Ripper, version 1.8.0.2-bleeding-jumbo_mpi >>[linux-x86-64-opencl] >> >Is this a very recent snapshot or an older one? Some timer oddities has >changed (for the better, hopefully) very recently. Like yesterday... > my snapshot is about two weeks old. I can run an mpirun opencl session just fine and all sessions >>complete just fine. My only trouble is with session restore and only >>if it involves remote hosts. I can resume a session if there is not a >>remote host. However, if I terminate a session with a one or more >>remote hosts using "killall mpirun", "kill HUP" or Ctrl-c and try to >>restore the session, only one core or one GPU will resume. >> >When you say "one" I assume you mean only the root node resumes? > All 6 cores resume on the master node, but only one core on each of the remote computers. Does this happen even if the session had been running for 30 minutes or >more? Did you set "Save = 60" in john.conf per the instructions? Before >killing an MPI session, you should "kill -USR1" the mpirun that controls >it. This should trigger a session save. Then wait at least 30 seconds >before aborting them. > Yes, it happens even if the session has been running for more than 30 minutes. I had "Save=600", but changed it to 60 in john.conf on all computers. I did not notice any difference. To abort, I use "pkill -USR1 mpirun" to trigger a session save, wait 30 seconds, then I do "killall mpirun". Is this the correct way to end a session?? What happens with the other nodes? Do they silently just not resume or >are there any errors or other clues? > The remote nodes resume but with only one core each. When I start a new session I get this: Device 2: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Device 1: Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz Device 2: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz Local worksize (LWS) 1, Global worksize (GWS) 8192 Local worksize (LWS) 1, Global worksize (GWS) 16384 Local worksize (LWS) 1, Global worksize (GWS) 1024 Loaded 4 password hashes with 4 different salts (wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL 4x]) Node numbers 1-3 of 3 (MPI) Note: minimum length forced to 8 Send SIGHUP to john process for status However, when I try to resume a session I get this: 0 Session completed 0 Session completed Device 1: Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz Local worksize (LWS) 1, Global worksize (GWS) 16384 Loaded 4 password hashes with 4 different salts (wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL 4x]) Node numbers 1-3 of 3 (MPI) Note: minimum length forced to 8 Send SIGHUP to john process for status Notice that only Device 1 of the master node is listed above, All six cores on the master start, however, only core 2 on each of the remote computers start. If I do a "pkill -USR1 mpirun" after a session resume I will get: -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 7490 on node ub1 exited on signal 10 (User defined signal 1). -------------------------------------------------------------------------- and the session will abort and take me back to the prompt: That message above dose not always indicate the same node but varies between all nodes including the master. Can you see any clues in the log files? > The john log files look good, no errors. Is there any other logs I should check? Also, the john.rec files on each computer are updated each time I do a "pkill -USR1 mpirun" and look good. If it's a very slow (unresponsive) format, try running with lower GWS >(using eg. "mpirun -x GWS=2048" or whatever number is a lot lower than >what is othereise used) when testing. > I lowered GWS down to 512 and no difference. Any more ideas? magnum > > > Thanks again magnum! On Friday, March 7, 2014 1:26 AM, Anthony Tanoury <tanoury@...oo.com> wrote: Greetings JTR wizards- I humbly bow before your knowledge again, in quest of jtr enlightenment.... I use John the Ripper, version 1.8.0.2-bleeding-jumbo_mpi [linux-x86-64-opencl] I can run an mpirun opencl session just fine and all sessions complete just fine. My only trouble is with session restore and only if it involves remote hosts. I can resume a session if there is not a remote host. However, if I terminate a session with a one or more remote hosts using "killall mpirun", "kill HUP" or Ctrl-c and try to restore the session, only one core or one GPU will resume. I use the following syntax to restore: mpirun --host ub0 --host ub1 --host ub2 ./john --restore I also have version 1.8.0.2-bleeding-jumbo_mpi [linux-x86-64-native] and I can restore multi host sessions just fine usng: mpirun -n 32 --hostfile /etc/nodes ./john --restore Any idea why I'm having so much trouble restoring mpi opencl multi host sessions?? Thanks, Tony
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.