Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 18 Mar 2016 03:23:10 -0600 (MDT)
From: "eidolon@...w.ca" <eidolon@...w.ca>
To: john-users@...ts.openwall.com
Subject: Re: jtr compilation and running issues

Using latest jumbo from openwall site. 

Didn't say it was a bug or problem but it's a bit frustrating - one of the joys of dealing with shared clusters. 

I just did a fresh compile with cuda, mpi, opencl 

When I run on head node: 
[jabo@...d run]$ ./john | more 
[head.benchmark.neo.microway.com:10649] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_ess_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) 
[head.benchmark.neo.microway.com:10649] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_db_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) 
[head.benchmark.neo.microway.com:10649] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_grpcomm_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) 
[head.benchmark.neo.microway.com:10649] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_pubsub_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) 
John the Ripper password cracker, version 1.8.0-jumbo-1_mpi+omp [linux-gnu 64-bit AVX-autoconf] 
[.. rest of jtr output .. ] 

When trying to run --list=opencl-devices 

[jabo@...d run]$ ./john --list=opencl-devices 
[head.benchmark.neo.microway.com:10918] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_ess_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) 
[head.benchmark.neo.microway.com:10918] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_db_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) 
[head.benchmark.neo.microway.com:10918] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_grpcomm_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) 
[head.benchmark.neo.microway.com:10918] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_pubsub_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) 
-------------------------------------------------------------------------- 
An MPI process has executed an operation involving a call to the 
"fork()" system call to create a child process. Open MPI is currently 
operating in a condition that could result in memory corruption or 
other system errors; your MPI job may hang, crash, or produce silent 
data corruption. The use of fork() (or system() or other calls that 
create child processes) is strongly discouraged. 

The process that invoked fork was: 

Local host: head (PID 10918) 
MPI_COMM_WORLD rank: 0 

If you are *absolutely sure* that your application will successfully 
and correctly survive a call to fork(), you may disable this warning 
by setting the mpi_warn_on_fork MCA parameter to 0. 
-------------------------------------------------------------------------- 
Error: --device must be numerical, or one of "all", "cpu", "gpu" and 
"accelerator". 

When trying to run --list=cuda-devices: 

[jabo@...d run]$ ./john --list=cuda-devices 
[head.benchmark.neo.microway.com:10931] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_ess_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) 
[head.benchmark.neo.microway.com:10931] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_db_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) 
[head.benchmark.neo.microway.com:10931] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_grpcomm_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) 
[head.benchmark.neo.microway.com:10931] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_pubsub_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) 
-------------------------------------------------------------------------- 
An MPI process has executed an operation involving a call to the 
"fork()" system call to create a child process. Open MPI is currently 
operating in a condition that could result in memory corruption or 
other system errors; your MPI job may hang, crash, or produce silent 
data corruption. The use of fork() (or system() or other calls that 
create child processes) is strongly discouraged. 

The process that invoked fork was: 

Local host: head (PID 10931) 
MPI_COMM_WORLD rank: 0 

If you are *absolutely sure* that your application will successfully 
and correctly survive a call to fork(), you may disable this warning 
by setting the mpi_warn_on_fork MCA parameter to 0. 
-------------------------------------------------------------------------- 
Error: No CUDA-capable devices were detected by the installed CUDA driver. 


When running on an interactive session on a compute node: 
[jabo@...e6 run]$ ./john 
-------------------------------------------------------------------------- 
It looks like orte_init failed for some reason; your parallel process is 
likely to abort. There are many reasons that a parallel process can 
fail during orte_init; some of which are due to configuration or 
environment problems. This failure appears to be an internal failure; 
here's some additional information (which may only be relevant to an 
Open MPI developer): 

PMI2_Job_GetId failed failed 
--> Returned value (null) (14) instead of ORTE_SUCCESS 
-------------------------------------------------------------------------- 
-------------------------------------------------------------------------- 
It looks like orte_init failed for some reason; your parallel process is 
likely to abort. There are many reasons that a parallel process can 
fail during orte_init; some of which are due to configuration or 
environment problems. This failure appears to be an internal failure; 
here's some additional information (which may only be relevant to an 
Open MPI developer): 

orte_ess_init failed 
--> Returned value (null) (14) instead of ORTE_SUCCESS 
-------------------------------------------------------------------------- 
-------------------------------------------------------------------------- 
It looks like MPI_INIT failed for some reason; your parallel process is 
likely to abort. There are many reasons that a parallel process can 
fail during MPI_INIT; some of which are due to configuration or environment 
problems. This failure appears to be an internal failure; here's some 
additional information (which may only be relevant to an Open MPI 
developer): 

ompi_mpi_init: ompi_rte_init failed 
--> Returned "(null)" (14) instead of "Success" (0) 
-------------------------------------------------------------------------- 
*** An error occurred in MPI_Init 
*** on a NULL communicator 
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, 
*** and potentially your MPI job) 
[node6.benchmark.neo.microway.com:24529] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! 

When trying to get devices on compute node, same response. 

[jabo@...d src]$ ldd ../run/john 
linux-vdso.so.1 => (0x00007fffd97e1000) 
libssl.so.10 => /usr/lib64/libssl.so.10 (0x0000003c80e00000) 
libcrypto.so.10 => /usr/lib64/libcrypto.so.10 (0x0000003c80a00000) 
libgmp.so.3 => /usr/lib64/libgmp.so.3 (0x0000003a25c00000) 
libcudart.so.7.0 => /usr/local/cuda-7.0/lib64/libcudart.so.7.0 (0x00007f0cd9542000) 
libOpenCL.so.1 => /usr/lib64/nvidia/libOpenCL.so.1 (0x00007f0cd933b000) 
libm.so.6 => /lib64/libm.so.6 (0x0000003a23c00000) 
libz.so.1 => /lib64/libz.so.1 (0x0000003a24800000) 
libdl.so.2 => /lib64/libdl.so.2 (0x0000003a24400000) 
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x0000003a2a400000) 
libbz2.so.1 => /lib64/libbz2.so.1 (0x0000003a34c00000) 
libmpi.so.1 => /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/libmpi.so.1 (0x00007f0cd905d000) 
libgomp.so.1 => /mcms/x86_64/core/gcc/4.8.3/lib64/libgomp.so.1 (0x00007f0cd8e4f000) 
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a24000000) 
libc.so.6 => /lib64/libc.so.6 (0x0000003a23800000) 
libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x0000003a32000000) 
libkrb5.so.3 => /lib64/libkrb5.so.3 (0x0000003a31000000) 
libcom_err.so.2 => /lib64/libcom_err.so.2 (0x0000003a30800000) 
libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x0000003a31c00000) 
librt.so.1 => /lib64/librt.so.1 (0x0000003a24c00000) 
/lib64/ld-linux-x86-64.so.2 (0x0000003a23400000) 
libfreebl3.so => /usr/lib64/libfreebl3.so (0x0000003a2a000000) 
libopen-rte.so.7 => /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/libopen-rte.so.7 (0x00007f0cd8bd1000) 
libopen-pal.so.6 => /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/libopen-pal.so.6 (0x00007f0cd88ea000) 
libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000003a28000000) 
libpciaccess.so.0 => /usr/lib64/libpciaccess.so.0 (0x0000003a29400000) 
libutil.so.1 => /lib64/libutil.so.1 (0x0000003a29c00000) 
libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x0000003a31400000) 
libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x0000003a31800000) 
libresolv.so.2 => /lib64/libresolv.so.2 (0x0000003a25800000) 
libselinux.so.1 => /lib64/libselinux.so.1 (0x0000003a25400000) 


^looking at that, it seems like its loading libraries from two different spots which might be an/the issue..? (libcudart.so from cuda7.0 dir and libOpenCL.so from /usr/lib64/nvidia..?) 

Recompile sans openmpi and: 



[jabo@...d run]$ ./john hash 
Warning: detected hash type "rar", but the string is also recognized as "rar-opencl" 
Use the "--format=rar-opencl" option to force loading these as that type instead 
Loaded 1 password hash (rar, RAR3 [SHA1 AES 32/64]) 
Will run 8 OpenMP threads 
^CSession aborted 
[jabo@...d run]$ ./john --list=opencl-devices 
Error: --device must be numerical, or one of "all", "cpu", "gpu" and 
"accelerator". 
[jabo@...d run]$ ./john --list=cuda-devices 
Error: No CUDA-capable devices were detected by the installed CUDA driver. 

and on compute: 
[jabo@...e6 run]$ ./john hash 
Warning: detected hash type "rar", but the string is also recognized as "rar-opencl" 
Use the "--format=rar-opencl" option to force loading these as that type instead 
Loaded 1 password hash (rar, RAR3 [SHA1 AES 32/64]) 
Will run 24 OpenMP threads 
Press 'q' or Ctrl-C to abort, almost any other key for status 
Session aborted 
[jabo@...e6 run]$ ./john --list=opencl-devices 
Error: --device must be numerical, or one of "all", "cpu", "gpu" and 
"accelerator". 
[jabo@...e6 run]$ ./john --list=cuda-devices 
Error: No CUDA-capable devices were detected by the installed CUDA driver. 
[jabo@...e6 run]$ ldd ./john 
linux-vdso.so.1 => (0x00007ffd6df45000) 
libssl.so.10 => /usr/lib64/libssl.so.10 (0x00007f7224b7b000) 
libcrypto.so.10 => /usr/lib64/libcrypto.so.10 (0x00007f7224797000) 
libgmp.so.3 => /usr/lib64/libgmp.so.3 (0x00007f722453c000) 
libcudart.so.7.0 => /usr/local/cuda-7.0/lib64/libcudart.so.7.0 (0x00007f72242df000) 
libOpenCL.so.1 => /usr/lib64/nvidia/libOpenCL.so.1 (0x00007f72240d8000) 
libm.so.6 => /lib64/libm.so.6 (0x00007f7223e54000) 
libz.so.1 => /lib64/libz.so.1 (0x00007f7223c3e000) 
libdl.so.2 => /lib64/libdl.so.2 (0x00007f7223a39000) 
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f7223802000) 
libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f72235f1000) 
libgomp.so.1 => /mcms/x86_64/core/gcc/4.8.3/lib64/libgomp.so.1 (0x00007f72233e2000) 
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f72231c5000) 
libc.so.6 => /lib64/libc.so.6 (0x00007f7222e31000) 
libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f7222bec000) 
libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f7222905000) 
libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f7222701000) 
libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f72224d4000) 
librt.so.1 => /lib64/librt.so.1 (0x00007f72222cc000) 
/lib64/ld-linux-x86-64.so.2 (0x00007f7224df9000) 
libfreebl3.so => /usr/lib64/libfreebl3.so (0x00007f72220c8000) 
libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f7221ebd000) 
libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f7221cba000) 
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f7221a9f000) 
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f7221880000) 


Thanks for helping, cheers. 

----- Original Message -----

From: "magnum" <john.magnum@...hmail.com> 
To: john-users@...ts.openwall.com 
Sent: Friday, March 18, 2016 1:58:01 AM 
Subject: Re: [john-users] jtr compilation and running issues 

On 2016-03-18 00:32, eidolon@...w.ca wrote: 
> I have been given access to a test cluster, managed by slurm, mpi capable with nodes containing dual 12-core xeons and a variety of nvidia (eg k80, k40, m40) cards for testing. 
> 
> The system manages modules with lmod, so I load opnmpi/1.8.4 (latest avail), gcc/4.8.3 (which openmpi/1.8.4 is compiled with) and cuda/7.0. I've also tried 7.5 and am trying 6.0 now. 
> 
> I run configure, and it still doesn't find anything. So I pack the paths into my PATH and LDFLAGS, CPPFLAGS, CFLAGS, pointing to the appropriate headers and libs. 
> 
> Then './configure --enable-cuda --enable-mpi --enable-opencl' runs fine and a make -j24 pops me a john exec. 

A need for specifying CFLAGS et al isn't necessarily a bug or problem by 
itself. We try to autodetect some common situations, that's all. 

> I've had various errors in different states. Enabling mpi seems to fail, so I tried disabling that. 

I take it you get a build but it fails at runtime? In what way does it fail? 

> Compiling with just --enable-opencl and --enable-cuda gives me a john that doesn't error out but doesn't see any opencl devices (not even the CPUs!) 

Does the output of `ldd ../run/john` show a valid selection of opencl 
and cuda libs from expected locations? 

Does `../run/john --list=opencl-devices` or `../run/john 
--list=cuda-devices` give any information? 

> I've tried compiling on the head node and one of the compute nodes with no better results. 
> 
> Am I missing something, or is there something misconfigured on this cluster? 
> 
> It's running Scientific Linux 6.6 which is based off RHEL. 

I see nothing wrong in what you do. Are you using latest Jumbo from 
GitHub? If not, you should definitely try that next. 

magnum 



Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.