john-users - Re: Help - mpi ocl restore session

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1394251181.33294.YahooMailNeo@web140304.mail.bf1.yahoo.com>
Date: Fri, 7 Mar 2014 19:59:41 -0800 (PST)
From: Anthony Tanoury <tanoury@...oo.com>
To: "john-users@...ts.openwall.com" <john-users@...ts.openwall.com>
Subject: Re: Help - mpi ocl restore session

Thanks for the response magnum! 

On 14-03-07 02:31 AM, magnum wrote: 

On 2014-03-07 07:26, Anthony Tanoury wrote: 
>
>I use John the Ripper, version 1.8.0.2-bleeding-jumbo_mpi 
>>[linux-x86-64-opencl] 
>>
>Is this a very recent snapshot or an older one? Some timer oddities has 
>changed (for the better, hopefully) very recently. Like yesterday... 
>
my snapshot is about two weeks old. 


I can run an mpirun opencl session just fine and all sessions 
>>complete just fine. My only  trouble is with session restore and only 
>>if it involves remote hosts. I can resume a session if there is not a 
>>remote host. However, if I terminate a session with a one or more 
>>remote hosts using "killall mpirun", "kill HUP" or Ctrl-c and try to 
>>restore the session, only one core or one GPU will resume. 
>>
>When you say "one" I assume you mean only the root node resumes? 
>
All 6 cores resume on the master node, but only one core on each of the 
remote computers. 


Does this happen even if the session had been running for 30 minutes or 
>more? Did you set "Save = 60" in john.conf per the instructions? Before 
>killing an MPI session, you should "kill -USR1" the mpirun that controls 
>it. This should trigger a session save. Then wait at least 30 seconds 
>before aborting them. 
>
Yes, it happens even if the session has been running for more than 30 
minutes. 

I had "Save=600", but changed it to 60 in john.conf on all computers. I 
did not notice any difference. 

To abort, I use "pkill -USR1 mpirun" to trigger a session save, wait 30 
seconds, then I do "killall mpirun". Is this the correct way to end a 
session?? 


What happens with the other nodes? Do they silently just not resume or 
>are there any errors or other clues? 
>
The remote nodes resume but with only one core each. 

When I start a new session I get this: 

Device 2: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz 
Device 1: Intel(R) Core(TM) i7 CPU       X 980  @ 3.33GHz 
Device 2: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz 
Local worksize (LWS) 1, Global worksize (GWS) 8192 
Local worksize (LWS) 1, Global worksize (GWS) 16384 
Local worksize (LWS) 1, Global worksize (GWS) 1024 
Loaded 4 password hashes with 4 different salts (wpapsk-opencl, WPA/WPA2 
PSK [PBKDF2-SHA1 OpenCL 4x]) 
Node numbers 1-3 of 3 (MPI) 
Note: minimum length forced to 8 
Send SIGHUP to john process for status 

However, when I try to resume a session I get this: 

0 Session completed 
0 Session completed 
Device 1: Intel(R) Core(TM) i7 CPU       X 980  @ 3.33GHz 
Local worksize (LWS) 1, Global worksize (GWS) 16384 
Loaded 4 password hashes with 4 different salts (wpapsk-opencl, WPA/WPA2 
PSK [PBKDF2-SHA1 OpenCL 4x]) 
Node numbers 1-3 of 3 (MPI) 
Note: minimum length forced to 8 
Send SIGHUP to john process for status 

Notice that only Device 1 of the master node is listed above, All six 
cores on the master start, however, only core 2 on each of the remote 
computers start. 

If I do a "pkill -USR1 mpirun" after a session resume I will get: 
-------------------------------------------------------------------------- 
mpirun noticed that process rank 1 with PID 7490 on node ub1 exited on 
signal 10 (User defined signal 1). 
-------------------------------------------------------------------------- 
and the session will abort and take me back to the prompt: 

That message above dose not always indicate the same node but varies 
between all nodes including the master. 


Can you see any clues in the log files? 
>
The john log files look good, no errors. Is there any other logs I 
should check? 

Also, the john.rec files on each computer are updated each time I do a 
"pkill -USR1 mpirun" and look good. 


If it's a very slow (unresponsive) format, try running with lower GWS 
>(using eg. "mpirun -x GWS=2048" or whatever number is a lot lower than 
>what is othereise used) when testing. 
>
I lowered GWS down to 512 and no difference. Any more ideas? 


magnum 
>
>
>
Thanks again magnum!







On Friday, March 7, 2014 1:26 AM, Anthony Tanoury <tanoury@...oo.com> wrote:
 
Greetings JTR wizards- 
 
I humbly bow before your knowledge again, in quest of jtr enlightenment.... 

I use John the Ripper, version 1.8.0.2-bleeding-jumbo_mpi [linux-x86-64-opencl] 

I can run an mpirun opencl session just fine and all sessions complete just
 fine. My only  trouble is with session restore and only if it involves remote hosts. I can resume a session if there is not a remote host. However, if I terminate a session with a one or more remote hosts using
 "killall mpirun", "kill HUP" or Ctrl-c and try to restore the session, only one core or one GPU will resume.

I use the following syntax to restore:

mpirun --host ub0 --host ub1 --host ub2 ./john --restore

I also have version 1.8.0.2-bleeding-jumbo_mpi [linux-x86-64-native] and I can
 restore multi host sessions just fine usng:

mpirun -n 32 --hostfile /etc/nodes ./john --restore

Any idea why I'm having so much trouble restoring mpi opencl multi host sessions??

Thanks,
Tony
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.