|
Message-ID: <CABeUhwvfqVE7dsarVXs4dFWfDS2Ka2n3-j1BZQUcakv+qun4MQ@mail.gmail.com> Date: Fri, 29 Jun 2012 22:07:53 +0200 From: newangels newangels <contact.newangels@...il.com> To: john-users@...ts.openwall.com Subject: Re: John the Ripper 1.7.9-jumbo-6 Hi Again, After my GPU Build attempt, i try an classic mac-x86-64, unfortunately that crashed again ! On OSX LION ( 1.7.4) - MaccbookPro 17' I do : macosx-x86-64 I got ; ld: symbol(s) not found for architecture x86_64 collect2: ld returned 1 exit status make[1]: *** [../run/john] Error 1 make: *** [macosx-x86-64] Error 2 Any help are welcome, Tthanks, Regards, Donovan 2012/6/29, newangels newangels <contact.newangels@...il.com>: > Hi, > > Verry nice news & so many improvement's ! thanks a lot to all of you > for the effort's & time. > > I just try to compiled on MAC_OSX LION an GPU enable build, but > unfotunately got an error. > > I do : > > make macosx-x86-64-opencl > > & i got : > > make[1]: *** [common_opencl_pbkdf2.o] Error 1 > make: *** [macosx-x86-64-opencl] Error 2 > > System information's : > > MacBook Pro 17' / ATI 6750 M - 1Go / SSD - osx-Lion > > Can some of you help me on this issue, > > Thanks a lot in advance, > > Regards, > > Donovan > > 2012/6/29, Solar Designer <solar@...nwall.com>: >> Hi, >> >> We've released John the Ripper 1.7.9-jumbo-6 earlier today. This is a >> "community-enhanced" version, which includes many contributions from JtR >> community members - in fact, that's what it primarily consists of. It's >> been half a year since 1.7.9-jumbo-5, which is a lot of time, and a lot >> has been added to jumbo since then. Even though it's just a one digit >> change in the version number, this is in fact the biggest single jumbo >> update we've made so far. It appears that between -5 and -6 the source >> code grew by over 1 MB, or by over 40,000 lines of code (and that's not >> including lines that were changed as opposed to added). The biggest new >> thing is integrated GPU support, both CUDA and OpenCL - although for a >> subset of the hash and non-hash types only, not for all that are >> supported on CPU. (Also, it is efficient only for so-called "slow" >> hashes now, and for the "non-hashes" that we chose to support on GPU. >> For "fast" hashes, it is just a development milestone, albeit a >> desirable one as well.) The other biggest new thing is the addition of >> support for many more "non-hashes" and hashes (see below). >> >> You may download John the Ripper 1.7.9-jumbo-6 at the usual place: >> >> http://www.openwall.com/john/ >> >> With so many changes, even pushing this release out was difficult. >> Despite of the statement that "jumbo is buggy by definition", we did try >> to eliminate as many bugs as we reasonably could - but after a week of >> mad testing and bug-fixing, I chose to release the tree as-is, only >> documenting the remaining known bugs (below and in doc/BUGS). Still, we >> ended up posting over 1200 messages to john-dev in June - even though in >> prior months we did not even hit 500. Indeed, we did run plenty of >> tests and fix plenty of bugs, which you won't see in this release. >> >> I've included a lengthy description of some of the changes below, and >> below that I'll add some benchmark results that I find curious (such as >> for bcrypt on CPU vs. GPU). >> >> Direct code contributors to 1.7.9-jumbo-6 (since 1.7.9-jumbo-5), by >> commit count: >> >> magnum >> Dhiru Kholia >> Frank Dittrich >> JimF (Jim Fougeron) >> myrice (Dongdong Li) >> Claudio Andre >> Lukas Odzioba >> Solar Designer >> Sayantan Datta >> Samuele Giovanni Tonon >> Tavis Ormandy >> bartavelle (Simon Marechal) >> Sergey V >> bizonix >> Robert Veznaver >> Andras >> >> New non-hashes: >> * Mac OS X keychains [OpenMP] (Dhiru) >> - based on research from extractkeychain.py by Matt Johnston >> * KeePass 1.x files [OpenMP] (Dhiru) >> - keepass2john is based on ideas from kppy by Karsten-Kai Koenig >> http://gitorious.org/kppy/kppy >> * Password Safe [OpenMP, CUDA, OpenCL] (Dhiru, Lukas) >> * ODF files [OpenMP] (Dhiru) >> * Office 2007/2010 documents [OpenMP] (Dhiru) >> - office2john is based on test-dump-msole.c by Jody Goldberg and >> OoXmlCrypto.cs by Lyquidity Solutions Limited >> * Mozilla Firefox, Thunderbird, SeaMonkey master passwords [OpenMP] >> (Dhiru) >> - based on FireMaster and FireMasterLinux >> http://code.google.com/p/rainbowsandpwnies/wiki/FiremasterLinux >> * RAR -p mode encrypted archives (magnum) >> - RAR -hp mode was supported previously, now both modes are >> >> New challenge/responses, MACs: >> * WPA-PSK [OpenMP, CUDA, OpenCL] (Lukas, Solar) >> - CPU code is loosely based on Aircrack-ng >> http://www.aircrack-ng.org >> http://openwall.info/wiki/john/WPA-PSK >> * VNC challenge/response authentication [OpenMP] (Dhiru) >> - based on VNCcrack by Jack Lloyd >> http://www.randombit.net/code/vnccrack/ >> * SIP challenge/response authentication [OpenMP] (Dhiru) >> - based on SIPcrack by Martin J. Muench >> * HMAC-SHA-1, HMAC-SHA-224, HMAC-SHA-256, HMAC-SHA-384, HMAC-SHA-512 >> (magnum) >> >> New hashes: >> * IBM RACF [OpenMP] (Dhiru) >> - thanks to Nigel Pentland (author of CRACF) and Main Framed for >> providing >> algorithm details, sample code, sample RACF binary database, test >> vectors >> * sha512crypt (SHA-crypt) [OpenMP, CUDA, OpenCL] (magnum, Lukas, >> Claudio) >> - previously supported in 1.7.6+ only via "generic crypt(3)" interface >> * sha256crypt (SHA-crypt) [OpenMP, CUDA] (magnum, Lukas) >> - previously supported in 1.7.6+ only via "generic crypt(3)" interface >> * DragonFly BSD SHA-256 and SHA-512 based hashes [OpenMP] (magnum) >> * Django 1.4 [OpenMP] (Dhiru) >> * Drupal 7 $S$ phpass-like (based on SHA-512) [OpenMP] (magnum) >> * WoltLab Burning Board 3 [OpenMP] (Dhiru) >> * New EPiServer default (based on SHA-256) [OpenMP] (Dhiru) >> * GOST R 34.11-94 [OpenMP] (Dhiru, Sergey V, JimF) >> * MD4 support in "dynamic" hashes (user-configurable) (JimF) >> - previously, only MD5 and SHA-1 were supported in "dynamic" >> * Raw-SHA1-LinkedIn (raw SHA-1 with first 20 bits zeroed) (JimF) >> >> Alternate implementations for previously supported hashes: >> * Faster raw SHA-1 (raw-sha1-ng, password length up to 15) (Tavis) >> >> OpenMP support in new formats: >> * Mac OS X keychains (Dhiru) >> * KeePass 1.x files (Dhiru) >> * Password Safe (Lukas) >> * ODF files (Dhiru) >> * Office 2007/2010 documents (Dhiru) >> * Mozilla Firefox, Thunderbird, SeaMonkey master passwords (Dhiru) >> * WPA-PSK (Solar) >> * VNC challenge/response authentication (Dhiru) >> * SIP challenge/response authentication (Dhiru) >> * IBM RACF (Dhiru) >> * DragonFly BSD SHA-256 and SHA-512 based hashes (magnum) >> * Django 1.4 (Dhiru) >> * Drupal 7 $S$ phpass-like (based on SHA-512) (magnum) >> * WoltLab Burning Board 3 (Dhiru) >> * New EPiServer default (based on SHA-256) (Dhiru) >> * GOST R 34.11-94 (Dhiru, JimF) >> >> OpenMP support for previously supported hashes that lacked it: >> * Mac OS X 10.4 - 10.6 salted SHA-1 (magnum) >> * DES-based tripcodes (Solar) >> * Invision Power Board 2.x salted MD5 (magnum) >> * HTTP Digest access authentication MD5 (magnum) >> * MySQL (old) (Solar) >> >> CUDA support for: >> * phpass MD5-based "portable hashes" (Lukas) >> * md5crypt (FreeBSD-style MD5-based crypt(3) hashes) (Lukas) >> * sha512crypt (glibc 2.7+ SHA-crypt) (Lukas) >> * sha256crypt (glibc 2.7+ SHA-crypt) (Lukas) >> * Password Safe (Lukas) >> * WPA-PSK (Lukas) >> * Raw SHA-224, raw SHA-256 [inefficient] (Lukas) >> * MSCash (DCC) [not working reliably yet] (Lukas) >> * MSCash2 (DCC2) [not working reliably yet] (Lukas) >> * Raw SHA-512 [not working reliably yet] (myrice) >> * Mac OS X 10.7 salted SHA-512 [not working reliably yet] (myrice) >> - we have already identified the problem with the above two, and a post >> 1.7.9-jumbo-6 fix should be available shortly - please ask on >> john-users >> if >> interested in trying it out >> >> OpenCL support for: >> * phpass MD5-based "portable hashes" (Lukas) >> * md5crypt (FreeBSD-style MD5-based crypt(3) hashes) (Lukas) >> * sha512crypt (glibc 2.7+ SHA-crypt) (Claudio) >> - suitable for NVIDIA cards, faster than the CUDA implementation above >> http://openwall.info/wiki/john/OpenCL-SHA-512 >> * bcrypt (OpenBSD-style Blowfish-based crypt(3) hashes) (Sayantan) >> - pre-configured for AMD Radeon HD 7970, will likely fail on others >> unless >> WORK_GROUP_SIZE is adjusted in opencl_bf_std.h and opencl/bf_kernel.cl; >> the achieved level of performance is CPU-like (bcrypt is known to be >> somewhat GPU-unfriendly - a lot more than SHA-512) >> http://openwall.info/wiki/john/GPU/bcrypt >> * MSCash2 (DCC2) (Sayantan) >> - with optional and experimental multi-GPU support as a compile-time >> hack >> (even AMD+NVIDIA mix), by editing init() in opencl_mscash2_fmt.c >> * Password Safe (Lukas) >> * WPA-PSK (Lukas) >> * RAR (magnum) >> * MySQL 4.1 double-SHA-1 [inefficient] (Samuele) >> * Netscape LDAP salted SHA-1 (SSHA) [inefficient] (Samuele) >> * NTLM [inefficient] (Samuele) >> * Raw MD5 [inefficient] (Dhiru, Samuele) >> * Raw SHA-1 [inefficient] (Samuele) >> * Raw SHA-512 [not working properly yet] (myrice) >> * Mac OS X 10.7 salted SHA-512 [not working properly yet] (myrice) >> - we have already identified the problem with the above two, and a post >> 1.7.9-jumbo-6 fix should be available shortly - please ask on >> john-users >> if >> interested in trying it out >> >> Several of these require byte-addressable store (any NVIDIA card, but >> only 5000 series or newer if AMD/ATI). Also, OpenCL kernels for "slow" >> hashes/non-hashes (e.g. RAR) may cause "ASIC hang" on certain AMD/ATI >> cards with recent driver versions. We'll try to address these issues in >> a future version. >> >> AMD XOP (Bulldozer) support added for: >> * Many hashes based on MD4, MD5, SHA-1 (Solar) >> >> Uses of SIMD (MMX assembly, SSE2/AVX/XOP intrinsics) added for: >> * Mac OS X 10.4 - 10.6 salted SHA-1 (magnum) >> * Invision Power Board 2.x salted MD5 (magnum) >> * HTTP Digest access authentication MD5 (magnum) >> * SAP CODVN B (BCODE) MD5 (magnum) >> * SAP CODVN F/G (PASSCODE) SHA-1 (magnum) >> * Oracle 11 (magnum) >> >> Other optimizations: >> * Reduced memory usage for raw-md4, raw-md5, raw-sha1, and nt2 (magnum) >> * Prefer CommonCrypto over OpenSSL on Mac OS X 10.7 (Dhiru) >> * New SSE2 intrinsics code for SHA-1 (JimF, magnum) >> * Smarter use of SSE2 and SSSE3 intrinsics (the latter only if enabled in >> the >> compiler at build time) to implement some bit rotates for MD5, SHA-1 >> (Solar) >> * Assorted optimizations for raw SHA-1 and HMAC-MD5 (magnum) >> * In RAR format, added inline storing of RAR data in JtR input file when >> the >> original file is small enough (magnum) >> * Added use of the bitslice DES implementation for tripcodes (Solar) >> * Raw-MD5-unicode made "thick" again (that is, not building upon >> "dynamic"), >> using much faster code (magnum) >> * Assorted performance tweaks in "salted-sha1" (SSHA) (magnum) >> * Added functions for larger hash tables to several formats (magnum, >> Solar) >> >> Other assorted enhancements: >> * linux-*-gpu (both CUDA and OpenCL at once), linux-*-cuda, >> linux-*-opencl, >> macosx-x86-64-opencl make targets (magnum et al.) >> * linux-*-native make targets (pass -march=native to gcc) (magnum) >> * New option: --dupe-suppression (for wordlist mode) (magnum) >> * New option: --loopback[=FILE] (implies --dupe-suppression) (magnum) >> * New option: --max-run-time=N for graceful exit after N seconds >> (magnum) >> * New option: --log-stderr (magnum) >> * New option: --regenerate-lost-salts=N for cracking hashes where we do >> not >> have the salt and essentially need to crack it as well (JimF) >> * New unlisted option: --list (for bash completion, GUI, etc.) (magnum) >> * --list=[encodings|opencl-devices] (magnum) >> * --list=cuda-devices (Lukas) >> * --list=format-details (Frank) >> * --list=subformats (magnum) >> * New unlisted option: --length=N for reducing maximum plaintext length >> of >> a >> format, mostly for testing purposes (magnum) >> * Enhanced parameter syntax for --markov: may refer to a configuration >> file >> section, may specify the start and/or end in percent of total (Frank) >> * Make incremental mode restore ETA figures (JimF) >> * In "dynamic", support NUL octets in constants (JimF) >> * In "salted-sha1" (SSHA), support any salt length (magnum) >> * Use comment and home directory fields from PWDUMP-style input (magnum) >> * Sort the format names list in "john" usage output alphabetically >> (magnum) >> * New john.conf options subsection "MPI" (magnum) >> * New john.conf config item CrackStatus under Options:Jumbo (magnum) >> * \xNN escape sequence to specify arbitrary characters in rules (JimF) >> * New rule command _N to reject a word unless it is of length N (JimF) >> * Extra wordlist rule sections: Extra, Single-Extra, Jumbo (magnum) >> * Enhanced "Double" external mode sample (JimF) >> * Source $JOHN/john.local.conf by default (magnum) >> * Many format and algorithm names have been changed for consistency >> (Solar) >> * When intrinsics are in use, the reported algorithm name now tells which >> ones >> (SSE2, AVX, or XOP) (Solar) >> * benchmark-unify: a Perl script to unify benchmark output of different >> versions of JtR for use with relbench (Frank) >> * Per-benchmark speed ratio output added to relbench (Frank) >> * bash completion for JtR (to install: "sudo make bash-completion") >> (Frank) >> * New program: raw2dyna (helper to convert raw hashes to "dynamic") >> (JimF) >> * New program: pass_gen.pl (generates hashes from plaintexts) (JimF, >> magnum) >> * Many code changes made, many bugs fixed, many new bugs introduced >> (all) >> >> Now the promised benchmarks. Here's 1.7.9-jumbo-5 to 1.7.9-jumbo-6 >> overall speed change on one core in FX-8120 (should be 4.0 GHz turbo), >> after running through benchmark-unify and relbench (yet about 50 of the >> new version's benchmark results could not be directly compared against >> results of the previous version, and thus are excluded): >> >> Number of benchmarks: 151 >> Minimum: 0.84668 real, 0.84668 virtual >> Maximum: 10.92416 real, 10.92416 virtual >> Median: 1.10800 real, 1.10800 virtual >> Median absolute deviation: 0.12531 real, 0.12369 virtual >> Geometric mean: 1.26217 real, 1.26284 virtual >> Geometric standard deviation: 1.47239 real, 1.47274 virtual >> >> Ditto for OpenMP-enabled builds (8 threads, should be 3.1 GHz): >> >> Number of benchmarks: 151 >> Minimum: 0.94616 real, 0.48341 virtual >> Maximum: 24.19709 real, 4.29610 virtual >> Median: 1.17609 real, 1.05964 virtual >> Median absolute deviation: 0.17436 real, 0.11465 virtual >> Geometric mean: 1.35493 real, 1.17097 virtual >> Geometric standard deviation: 1.71505 real, 1.36577 virtual >> >> These show that overall we do indeed have a speedup, and that's without >> any GPU stuff. >> >> Also curious is speedup due to OpenMP in 1.7.9-jumbo-6 (same version in >> both cases), on the same CPU (8 threads): >> >> Number of benchmarks: 202 >> Minimum: 0.76235 real, 0.09553 virtual >> Maximum: 30.51791 real, 3.81904 virtual >> Median: 1.01479 real, 0.98287 virtual >> Median absolute deviation: 0.02747 real, 0.03514 virtual >> Geometric mean: 1.71441 real, 0.77454 virtual >> Geometric standard deviation: 2.08823 real, 1.50966 virtual >> >> The 30x maximum speedup (with only 8 threads) is indeed abnormal, it is >> for: >> >> Ratio: 30.51791 real, 3.81904 virtual SIP MD5:Raw >> >> We'll correct the non-OpenMP performance for SIP in the next version. >> For the rest, the maximum speedup is 6.13x for SSH, which is great >> (considering that the CPU clock rate reduces with more threads running, >> and that this is a 4-module CPU rather than a true 8-core). Here are >> the top 10 OpenMP performers (excluding SIP): >> >> Ratio: 6.13093 real, 0.77210 virtual SSH RSA/DSA (one 2048-bit RSA and >> one 1024-bit DSA key):Raw >> Ratio: 6.05882 real, 0.75737 virtual NTLMv2 C/R MD4 HMAC-MD5:Many >> salts >> Ratio: 6.04342 real, 0.75548 virtual LMv2 C/R MD4 HMAC-MD5:Many salts >> Ratio: 5.92830 real, 0.74108 virtual GOST R 34.11-94:Raw >> Ratio: 5.81605 real, 0.73986 virtual sha256crypt (rounds=5000):Raw >> Ratio: 5.65289 real, 0.70523 virtual sha512crypt (rounds=5000):Raw >> Ratio: 5.63333 real, 0.72034 virtual Drupal 7 $S$ SHA-512 (x16385):Raw >> Ratio: 5.56435 real, 0.69937 virtual OpenBSD Blowfish (x32):Raw >> Ratio: 5.50484 real, 0.69682 virtual Password Safe SHA-256:Raw >> Ratio: 5.49613 real, 0.68814 virtual Sybase ASE salted SHA-256:Many >> salts >> >> The worst regression is for: >> >> Ratio: 0.76235 real, 0.09553 virtual LM DES:Raw >> >> It is known that our current LM hash code does not scale well, and is >> very fast even with one thread (close to the bottleneck of the current >> interface). It is in fact better not to use OpenMP for LM hashes yet, >> or to keep the thread count low (e.g., 4 would behave better than 8). >> The low median and mean speedup are because many hashes still lack >> OpenMP support - mostly the "fast" ones, where we'd bump into the >> bottleneck anyway. We might deal with this later. For "slow" hashes, >> the speedup with OpenMP is close to perfect (5x to 6x for this CPU). >> >> Now to the new stuff. The effect of XOP (make linux-x86-64-xop): >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=md5 >> Benchmarking: FreeBSD MD5 [128/128 XOP intrinsics 8x]... (8xOMP) DONE >> Raw: 204600 c/s real, 25625 c/s virtual >> >> -5 achieved at most: >> >> user@...l:~/john-1.7.9-jumbo-5/run$ ./john -te -fo=md5 >> Benchmarking: FreeBSD MD5 [SSE2i 12x]... (8xOMP) DONE >> Raw: 158208 c/s real, 19751 c/s virtual >> >> with "make linux-x86-64i" (icc precompiled SSE2 intrinsics), and only: >> >> user@...l:~/john-1.7.9-jumbo-5/run$ ./john -te -fo=md5 >> Benchmarking: FreeBSD MD5 [SSE2i 12x]... (8xOMP) DONE >> Raw: 141312 c/s real, 17664 c/s virtual >> >> with "make linux-x86-64-xop" because it did not yet use XOP for MD5 (nor >> for MD4 and SHA-1), only knowing how to use it for DES (which it did). >> >> So we got an over 20% speedup due to XOP here. >> >> Similarly, for raw SHA-1 best result with -5: >> >> user@...l:~/john-1.7.9-jumbo-5/run$ ./john -te -fo=raw-sha1 >> Benchmarking: Raw SHA-1 [SSE2i 8x]... DONE >> Raw: 13067K c/s real, 13067K c/s virtual >> >> whereas -6 does, with JimF's and magnum's optimizations and with XOP: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=raw-sha1 >> Benchmarking: Raw SHA-1 [128/128 XOP intrinsics 8x]... DONE >> Raw: 23461K c/s real, 23698K c/s virtual >> >> and with Tavis' contribution: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=raw-sha1-ng >> Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 XOP intrinsics 4x]... DONE >> Raw: 28024K c/s real, 28024K c/s virtual >> >> So that's an over 2x speedup if we can accept the length 15 limit, or >> an almost 80% speedup otherwise. >> >> Note: all of the raw SHA-1 benchmarks above are for one CPU core, not >> for the entire chip (no OpenMP for fast hashes like this yet, but >> there's MPI and there are always separate process invocations...) >> >> To more important stuff, sha512crypt on CPU vs. GPU: >> >> For reference, here's what we would get with the previous version, using >> the glibc implementation of SHA-crypt: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=crypt -sub=sha512crypt >> Benchmarking: generic crypt(3) SHA-512 rounds=5000 [?/64]... (8xOMP) DONE >> Many salts: 1518 c/s real, 189 c/s virtual >> Only one salt: 1515 c/s real, 189 c/s virtual >> >> Now we also have builtin implementation, although it nevertheless uses >> OpenSSL for the SHA-512 primitive (it doesn't have its own SHA-512 yet - >> adding that and making use of SIMD would provide much additional >> speedup, this is a to-do item for us): >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=sha512crypt >> Benchmarking: sha512crypt (rounds=5000) [64/64]... (8xOMP) DONE >> Raw: 2045 c/s real, 256 c/s virtual >> >> So it is about 35% faster. Let's try GPUs, first GTX 570 1600 MHz >> (a card that is vendor-overclocked to that frequency): >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=sha512crypt-cuda >> Benchmarking: sha512crypt (rounds=5000) [CUDA]... DONE >> Raw: 3833 c/s real, 3833 c/s virtual >> >> Another 2x speedup here, but that's still not it. Let's see: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=sha512crypt-opencl >> OpenCL platform 0: NVIDIA CUDA, 1 device(s). >> Using device 0: GeForce GTX 570 >> Building the kernel, this could take a while >> Local work size (LWS) 512, global work size (GWS) 7680 >> Benchmarking: sha512crypt (rounds=5000) [OpenCL]... DONE >> Raw: 11405 c/s real, 11349 c/s virtual >> >> And now this is it - Claudio's OpenCL code is really good on NVIDIA, >> giving us a 5.5x speedup over CPU. (SHA-512 is not as GPU-friendly as >> e.g. MD5, but is friendly enough for some decent speedup.) >> >> Let's also try AMD Radeon HD 7970 (normally a faster card), at stock >> clocks: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=sha512crypt-opencl >> -pla=1 >> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). >> Using device 0: Tahiti >> Building the kernel, this could take a while >> Elapsed time: 17 seconds >> Local work size (LWS) 32, global work size (GWS) 16384 >> Benchmarking: sha512crypt (rounds=5000) [OpenCL]... DONE >> Raw: 5144 c/s real, 3276K c/s virtual >> >> Not as much luck here yet. Finally, for comparison and to show how any >> one of the three OpenCL devices may be accessed from john's command-line >> with --platform and --device options, the same OpenCL code on the CPU: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=sha512crypt-opencl >> -pla=1 >> -dev=1 >> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). >> Using device 1: AMD FX(tm)-8120 Eight-Core Processor >> Local work size (LWS) 1, global work size (GWS) 1024 >> Benchmarking: sha512crypt (rounds=5000) [OpenCL]... DONE >> Raw: 1850 c/s real, 233 c/s virtual >> >> This shows that the code is indeed pretty efficient - almost reaching >> OpenSSL's specialized code speed. >> >> Now to bcrypt. This CPU is pretty good at it: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=bf >> Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... (8xOMP) DONE >> Raw: 5300 c/s real, 664 c/s virtual >> >> (FWIW, with overclocking I was able to get this to about 5650 c/s, but >> not more - bumping into 125 W TDP. The above is at stock clocks.) >> >> This is for "$2a$05" or only 32 iterations, which is used as baseline >> for benchmarks for historical reasons. Actual systems often use >> "$2a$08" (8 times slower) to "$2a$10" (32 times slower) these days. >> >> Anyway, the reference cracking speed for bcrypt above is higher than the >> speed for sha512crypt on the same CPU (with the current code at least, >> which admittedly can be optimized much further). Can we make it even >> higher on a GPU? Maybe, but not yet, not with the current code: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=bf-opencl -pla=1 >> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). >> Using device 0: Tahiti >> ****Please see 'opencl_bf_std.h' for device specific optimizations**** >> Benchmarking: OpenBSD Blowfish (x32) [OpenCL]... DONE >> Raw: 4143 c/s real, 238933 c/s virtual >> >> user@...l:~/john-1.7.9-jumbo-6/run$ DISPLAY=:0 aticonfig --od-enable >> --od-setclocks=1225,1375 >> AMD Overdrive(TM) enabled >> >> Default Adapter - AMD Radeon HD 7900 Series >> New Core Peak : 1225 >> New Memory Peak : 1375 >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=bf-opencl -pla=1 >> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). >> Using device 0: Tahiti >> ****Please see 'opencl_bf_std.h' for device specific optimizations**** >> Benchmarking: OpenBSD Blowfish (x32) [OpenCL]... DONE >> Raw: 5471 c/s real, 358400 c/s virtual >> >> It's only with a 30% overclock that the high-end GPU gets to the same >> level of performance as the 2-3 times cheaper CPU. BTW, the GPU stays >> cool with this overclock (73 C with stock cooling when running bf-opencl >> for a while), precisely because we have to heavily under-utilize it due >> to it not having enough local memory to accommodate as many parallel >> bcrypt computations as we'd need for full occupancy and to hide memory >> access latencies. >> >> Maybe more optimal code will achieve better results, though. >> >> The NVIDIA card also has no luck competing with the CPU at bcrypt yet: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=bf-opencl >> OpenCL platform 0: NVIDIA CUDA, 1 device(s). >> Using device 0: GeForce GTX 570 >> ****Please see 'opencl_bf_std.h' for device specific optimizations**** >> Benchmarking: OpenBSD Blowfish (x32) [OpenCL]... DONE >> Raw: 1137 c/s real, 1137 c/s virtual >> >> Some tuning could provide better numbers, but they stay a lot lower than >> the CPU's and HD 7970's anyway (for the current code). >> >> Some other GPU benchmarks where I think we achieve decent performance >> (not exactly the best, but on par with competing tools that had GPU >> support for longer): >> >> GTX 570 1600 MHz: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=phpass-cuda >> Benchmarking: phpass MD5 ($P$9 lengths 1 to 15) [CUDA]... DONE >> Raw: 510171 c/s real, 507581 c/s virtual >> >> HD 7970 925 MHz (stock): >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=mscash2-opencl -pla=1 >> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). >> Using device 0: Tahiti >> Optimal Work Group Size:256 >> Kernel Execution Speed (Higher is better):1.403044 >> Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE >> Raw: 92467 c/s real, 92142 c/s virtual >> >> 1225 MHz: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=mscash2-opencl -pla=1 >> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). >> Using device 0: Tahiti >> Optimal Work Group Size:128 >> Kernel Execution Speed (Higher is better):1.856949 >> Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE >> Raw: 121644 c/s real, 121644 c/s virtual >> >> (would overheat if actually used? this is not bcrypt anymore) >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=rar >> OpenCL platform 0: NVIDIA CUDA, 1 device(s). >> Using device 0: GeForce GTX 570 >> Optimal keys per crypt 32768 >> (to avoid this test on next run, put "rar_GWS = 32768" in john.conf, >> section >> [Options:OpenCL]) >> Local worksize (LWS) 64, Global worksize (GWS) 32768 >> Benchmarking: RAR3 SHA-1 AES (6 characters) [OpenCL]... (8xOMP) DONE >> Raw: 4380 c/s real, 4334 c/s virtual >> >> The HD 7970 card is back to stock clocks here: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=rar -pla=1 >> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). >> Using device 0: Tahiti >> Optimal keys per crypt 65536 >> (to avoid this test on next run, put "rar_GWS = 65536" in john.conf, >> section >> [Options:OpenCL]) >> Local worksize (LWS) 64, Global worksize (GWS) 65536 >> Benchmarking: RAR3 SHA-1 AES (6 characters) [OpenCL]... (8xOMP) DONE >> Raw: 7162 c/s real, 468114 c/s virtual >> >> WPA-PSK, on CPU: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=wpapsk >> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [32/64]... (8xOMP) DONE >> Raw: 1980 c/s real, 247 c/s virtual >> >> (no SIMD yet; could do several times faster). CUDA: >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=wpapsk-cuda >> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [CUDA]... (8xOMP) DONE >> Raw: 32385 c/s real, 16695 c/s virtual >> >> OpenCL on the faster card (stock clock): >> >> user@...l:~/john-1.7.9-jumbo-6/run$ ./john -te -fo=wpapsk-opencl -pla=1 >> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). >> Using device 0: Tahiti >> Max local work size 256 >> Optimal local work size = 256 >> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... (8xOMP) DONE >> Raw: 55138 c/s real, 42442 c/s virtual >> >> 27x speedup over CPU here, although presumably the CPU code is further >> from optimal. >> >> ...Hey, what are you doing here? That message was way too long, you >> couldn't possibly read this far. I'll just presume you scrolled to the >> end. There's good stuff you have missed above, so please scroll up. ;-) >> >> As usual, feedback is welcome on the john-users list. I realize that >> we're currently missing usage instructions for much of the new stuff, so >> please just ask on john-users - and try to make your questions specific. >> That way, code contributors will also be prompted/forced to contribute >> documentation, and we'll get it under doc/ and on the wiki - in fact, >> you can contribute to that too. >> >> Alexander >> >
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.