|
Message-ID: <646800207.7749612.1458292990643.JavaMail.root@shaw.ca> Date: Fri, 18 Mar 2016 03:23:10 -0600 (MDT) From: "eidolon@...w.ca" <eidolon@...w.ca> To: john-users@...ts.openwall.com Subject: Re: jtr compilation and running issues Using latest jumbo from openwall site. Didn't say it was a bug or problem but it's a bit frustrating - one of the joys of dealing with shared clusters. I just did a fresh compile with cuda, mpi, opencl When I run on head node: [jabo@...d run]$ ./john | more [head.benchmark.neo.microway.com:10649] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_ess_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) [head.benchmark.neo.microway.com:10649] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_db_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) [head.benchmark.neo.microway.com:10649] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_grpcomm_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) [head.benchmark.neo.microway.com:10649] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_pubsub_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) John the Ripper password cracker, version 1.8.0-jumbo-1_mpi+omp [linux-gnu 64-bit AVX-autoconf] [.. rest of jtr output .. ] When trying to run --list=opencl-devices [jabo@...d run]$ ./john --list=opencl-devices [head.benchmark.neo.microway.com:10918] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_ess_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) [head.benchmark.neo.microway.com:10918] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_db_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) [head.benchmark.neo.microway.com:10918] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_grpcomm_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) [head.benchmark.neo.microway.com:10918] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_pubsub_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) -------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: head (PID 10918) MPI_COMM_WORLD rank: 0 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- Error: --device must be numerical, or one of "all", "cpu", "gpu" and "accelerator". When trying to run --list=cuda-devices: [jabo@...d run]$ ./john --list=cuda-devices [head.benchmark.neo.microway.com:10931] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_ess_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) [head.benchmark.neo.microway.com:10931] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_db_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) [head.benchmark.neo.microway.com:10931] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_grpcomm_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) [head.benchmark.neo.microway.com:10931] mca: base: component_find: unable to open /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/openmpi/mca_pubsub_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) -------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: head (PID 10931) MPI_COMM_WORLD rank: 0 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- Error: No CUDA-capable devices were detected by the installed CUDA driver. When running on an interactive session on a compute node: [jabo@...e6 run]$ ./john -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PMI2_Job_GetId failed failed --> Returned value (null) (14) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_init failed --> Returned value (null) (14) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: ompi_rte_init failed --> Returned "(null)" (14) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [node6.benchmark.neo.microway.com:24529] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! When trying to get devices on compute node, same response. [jabo@...d src]$ ldd ../run/john linux-vdso.so.1 => (0x00007fffd97e1000) libssl.so.10 => /usr/lib64/libssl.so.10 (0x0000003c80e00000) libcrypto.so.10 => /usr/lib64/libcrypto.so.10 (0x0000003c80a00000) libgmp.so.3 => /usr/lib64/libgmp.so.3 (0x0000003a25c00000) libcudart.so.7.0 => /usr/local/cuda-7.0/lib64/libcudart.so.7.0 (0x00007f0cd9542000) libOpenCL.so.1 => /usr/lib64/nvidia/libOpenCL.so.1 (0x00007f0cd933b000) libm.so.6 => /lib64/libm.so.6 (0x0000003a23c00000) libz.so.1 => /lib64/libz.so.1 (0x0000003a24800000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003a24400000) libcrypt.so.1 => /lib64/libcrypt.so.1 (0x0000003a2a400000) libbz2.so.1 => /lib64/libbz2.so.1 (0x0000003a34c00000) libmpi.so.1 => /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/libmpi.so.1 (0x00007f0cd905d000) libgomp.so.1 => /mcms/x86_64/core/gcc/4.8.3/lib64/libgomp.so.1 (0x00007f0cd8e4f000) libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a24000000) libc.so.6 => /lib64/libc.so.6 (0x0000003a23800000) libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x0000003a32000000) libkrb5.so.3 => /lib64/libkrb5.so.3 (0x0000003a31000000) libcom_err.so.2 => /lib64/libcom_err.so.2 (0x0000003a30800000) libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x0000003a31c00000) librt.so.1 => /lib64/librt.so.1 (0x0000003a24c00000) /lib64/ld-linux-x86-64.so.2 (0x0000003a23400000) libfreebl3.so => /usr/lib64/libfreebl3.so (0x0000003a2a000000) libopen-rte.so.7 => /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/libopen-rte.so.7 (0x00007f0cd8bd1000) libopen-pal.so.6 => /mcms/x86_64/libs/openmpi/1.8.4/gcc/4.8.3/non-accelerated/lib/libopen-pal.so.6 (0x00007f0cd88ea000) libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000003a28000000) libpciaccess.so.0 => /usr/lib64/libpciaccess.so.0 (0x0000003a29400000) libutil.so.1 => /lib64/libutil.so.1 (0x0000003a29c00000) libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x0000003a31400000) libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x0000003a31800000) libresolv.so.2 => /lib64/libresolv.so.2 (0x0000003a25800000) libselinux.so.1 => /lib64/libselinux.so.1 (0x0000003a25400000) ^looking at that, it seems like its loading libraries from two different spots which might be an/the issue..? (libcudart.so from cuda7.0 dir and libOpenCL.so from /usr/lib64/nvidia..?) Recompile sans openmpi and: [jabo@...d run]$ ./john hash Warning: detected hash type "rar", but the string is also recognized as "rar-opencl" Use the "--format=rar-opencl" option to force loading these as that type instead Loaded 1 password hash (rar, RAR3 [SHA1 AES 32/64]) Will run 8 OpenMP threads ^CSession aborted [jabo@...d run]$ ./john --list=opencl-devices Error: --device must be numerical, or one of "all", "cpu", "gpu" and "accelerator". [jabo@...d run]$ ./john --list=cuda-devices Error: No CUDA-capable devices were detected by the installed CUDA driver. and on compute: [jabo@...e6 run]$ ./john hash Warning: detected hash type "rar", but the string is also recognized as "rar-opencl" Use the "--format=rar-opencl" option to force loading these as that type instead Loaded 1 password hash (rar, RAR3 [SHA1 AES 32/64]) Will run 24 OpenMP threads Press 'q' or Ctrl-C to abort, almost any other key for status Session aborted [jabo@...e6 run]$ ./john --list=opencl-devices Error: --device must be numerical, or one of "all", "cpu", "gpu" and "accelerator". [jabo@...e6 run]$ ./john --list=cuda-devices Error: No CUDA-capable devices were detected by the installed CUDA driver. [jabo@...e6 run]$ ldd ./john linux-vdso.so.1 => (0x00007ffd6df45000) libssl.so.10 => /usr/lib64/libssl.so.10 (0x00007f7224b7b000) libcrypto.so.10 => /usr/lib64/libcrypto.so.10 (0x00007f7224797000) libgmp.so.3 => /usr/lib64/libgmp.so.3 (0x00007f722453c000) libcudart.so.7.0 => /usr/local/cuda-7.0/lib64/libcudart.so.7.0 (0x00007f72242df000) libOpenCL.so.1 => /usr/lib64/nvidia/libOpenCL.so.1 (0x00007f72240d8000) libm.so.6 => /lib64/libm.so.6 (0x00007f7223e54000) libz.so.1 => /lib64/libz.so.1 (0x00007f7223c3e000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f7223a39000) libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f7223802000) libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f72235f1000) libgomp.so.1 => /mcms/x86_64/core/gcc/4.8.3/lib64/libgomp.so.1 (0x00007f72233e2000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f72231c5000) libc.so.6 => /lib64/libc.so.6 (0x00007f7222e31000) libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f7222bec000) libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f7222905000) libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f7222701000) libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f72224d4000) librt.so.1 => /lib64/librt.so.1 (0x00007f72222cc000) /lib64/ld-linux-x86-64.so.2 (0x00007f7224df9000) libfreebl3.so => /usr/lib64/libfreebl3.so (0x00007f72220c8000) libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f7221ebd000) libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f7221cba000) libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f7221a9f000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f7221880000) Thanks for helping, cheers. ----- Original Message ----- From: "magnum" <john.magnum@...hmail.com> To: john-users@...ts.openwall.com Sent: Friday, March 18, 2016 1:58:01 AM Subject: Re: [john-users] jtr compilation and running issues On 2016-03-18 00:32, eidolon@...w.ca wrote: > I have been given access to a test cluster, managed by slurm, mpi capable with nodes containing dual 12-core xeons and a variety of nvidia (eg k80, k40, m40) cards for testing. > > The system manages modules with lmod, so I load opnmpi/1.8.4 (latest avail), gcc/4.8.3 (which openmpi/1.8.4 is compiled with) and cuda/7.0. I've also tried 7.5 and am trying 6.0 now. > > I run configure, and it still doesn't find anything. So I pack the paths into my PATH and LDFLAGS, CPPFLAGS, CFLAGS, pointing to the appropriate headers and libs. > > Then './configure --enable-cuda --enable-mpi --enable-opencl' runs fine and a make -j24 pops me a john exec. A need for specifying CFLAGS et al isn't necessarily a bug or problem by itself. We try to autodetect some common situations, that's all. > I've had various errors in different states. Enabling mpi seems to fail, so I tried disabling that. I take it you get a build but it fails at runtime? In what way does it fail? > Compiling with just --enable-opencl and --enable-cuda gives me a john that doesn't error out but doesn't see any opencl devices (not even the CPUs!) Does the output of `ldd ../run/john` show a valid selection of opencl and cuda libs from expected locations? Does `../run/john --list=opencl-devices` or `../run/john --list=cuda-devices` give any information? > I've tried compiling on the head node and one of the compute nodes with no better results. > > Am I missing something, or is there something misconfigured on this cluster? > > It's running Scientific Linux 6.6 which is based off RHEL. I see nothing wrong in what you do. Are you using latest Jumbo from GitHub? If not, you should definitely try that next. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.