|
Message-ID: <20150330072445.GB25033@openwall.com> Date: Mon, 30 Mar 2015 10:24:45 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] John the Ripper support for PHC finalists Hi Agnieszka, On Mon, Mar 30, 2015 at 01:56:03AM +0200, Agnieszka Bielec wrote: > I have added OpenCL and I have fixed almost all things commented > by magnumripper https://github.com/Lucife-r/JohnTheRipper Thanks! > I would like to know if, program with my adjustment is working > on another machines. Even on this same machine, pomelo-opencl fails on the NVIDIA GPU, this is -dev=5. Please test. More importantly, though, please let us know which JtR source files you used as templates for your format, both host and OpenCL. At first glance, it looks like you used a fast hash's one, with hacks that are unlikely to be of much relevance to slow hashes such as PHC's (when invoked with reasonable settings). Also, where did you obtain the test vectors for POMELO from? It looks like they're for fairly low cost settings, perhaps lower than what POMELO would normally be used with. Here are the speeds I am getting: [solar@...er src]$ OMP_NUM_THREADS=1 ../run/john -te -form=pomelo Warning: OpenMP is disabled; a non-OpenMP build may be faster Benchmarking: pomelo, Generic pomelo [Pomelo]... DONE Many salts: 10944 c/s real, 10944 c/s virtual Only one salt: 10944 c/s real, 10944 c/s virtual [solar@...er src]$ OMP_NUM_THREADS=16 ../run/john -te -form=pomelo Will run 16 OpenMP threads Benchmarking: pomelo, Generic pomelo [Pomelo]... (16xOMP) DONE Many salts: 157184 c/s real, 9836 c/s virtual Only one salt: 156672 c/s real, 9816 c/s virtual [solar@...er src]$ export GOMP_CPU_AFFINITY=0-31 [solar@...er src]$ ../run/john -te -form=pomelo Will run 32 OpenMP threads Benchmarking: pomelo, Generic pomelo [Pomelo]... (32xOMP) DONE Many salts: 167794 c/s real, 5305 c/s virtual Only one salt: 168960 c/s real, 5284 c/s virtual [solar@...er src]$ ../run/john -te -form=pomelo-opencl Device 0: Tahiti [AMD Radeon HD 7900 Series] Local worksize (LWS) 64, global worksize (GWS) 1024 Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, development use only)]... DONE Raw: 9309 c/s real, 1024K c/s virtual [solar@...er src]$ ../run/john -te -form=pomelo-opencl -dev=5 Device 5: GeForce GTX TITAN Options used: -I ../run/kernels -cl-mad-enable -cl-nv-verbose -DDEVICE_INFO=4114 -D_OPENCL_COMPILER -DDEV_VER_MAJOR=319 -DDEV_VER_MINOR=60 Build log: :162:15: error: 'long long' type is not supported state_size = 1ULL << (13 + m_cost); //m_cost=3 is max ^ [...] [solar@...er src]$ ../run/john -te -form=pomelo-opencl -dev=2 Device 2: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Local worksize (LWS) 1, global worksize (GWS) 1024 Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, development use only)]... DONE Raw: 163840 c/s real, 5414 c/s virtual [solar@...er src]$ ../run/john -te -form=pomelo-opencl -dev=3 Device 3: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Build log: Compilation started Compilation done Linking started Linking done Kernel <pomelo_crypt_kernel> was not vectorized Done. Local worksize (LWS) 8, global worksize (GWS) 512 Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, development use only)]... DONE Raw: 11043 c/s real, 11043 c/s virtual [solar@...er src]$ ../run/john -te -form=pomelo-opencl -dev=4 Device 4: Intel(R) Many Integrated Core Acceleration Card Build log: Compilation started Compilation done Linking started Linking done Build started Kernel <pomelo_crypt_kernel> was successfully vectorized Done. Local worksize (LWS) 64, global worksize (GWS) 64 Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, development use only)]... DONE Raw: 7.3 c/s real, 6400 c/s virtual So the speed of C code is maybe good - I say maybe because we don't know yet how much better it can be made. One of two OpenCL SDKs running on the CPUs achieves about the same speed, which is a good sanity check. The other fails to vectorize the code, resulting in much lower speed. The speed on Xeon Phi via OpenCL is a joke, but that's not too surprising given that OpenCL isn't currently a good way to program Xeon Phi (Intel's OpenCL implementation for Xeon Phi is too poor). On AMD GPU, the performance is low - this needs to be looked into. On NVIDIA, the kernel fails to compile. > Curently the max m_cost in pomelo is 4 and I would like to get rid > of this limit Where does this limit come from? > Is it possible to change 'count' variable which is passed to crypt_all, > after the execution of opencl_ini_auto_setup() ? magnum is right - first please describe the problem and how you're trying to solve it by this. > I have also a few another questions. I've found in pomelo code > mentioned below. > > //check the size of password, salt and output. Password is at most > //256 bytes; the salt is at most 32 bytes. > if (inlen > 256 || saltlen > 64 || outlen > 256 || inlen < 0 || > saltlen < 0 || outlen < 0) > > > I'm not sure how I should set SALT_SIZE ? I ought to set it with > 32 or 64 ? magnum already answered what this means for JtR - just set SALT_SIZE to the maximum supported salt length. However, there might be a bug in POMELO's reference code here. Did this inconsistent commend and code come from there ("at most 32 bytes" vs. "saltlen > 64")? If so, we should report it on the PHC discussions list. Can you join that list and post in there, please? > magnumripper commented on src/pomelo_fmt_plug.c in c36a2ed > >Is there any specific reason (eg. performance) to limit max length? > >I would like to suggest you bump it to 125 which is the max of core john > > did you mean "#define PLAINTEXT_LENGTH 100" ? I guess this is what magnum meant, yes. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.