|
Message-ID: <BLU159-W2279E3853A35827D78B0AAA4720@phx.gbl> Date: Tue, 31 Jan 2012 01:32:35 +0000 From: Alex Sicamiotis <alekshs@...mail.com> To: <john-users@...ts.openwall.com> Subject: RE: DES with OpenMP > This is DES_bs_cpt in DES_bs.h, and this setting is only used in OpenMP > builds. By default it's 32, but you can try any value starting with 1. > OMP_NUM_THREADS=2 with Celeron E3200 @ 3.66 GHz (cpu use ~2% while in desktop). Build with GCC 4.6.2 (-O2 -march=nocona). john1 => DES_bs_cpt=1 john4 => DES_bs_cpt=4 etc etc all the way to 256. linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john1 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 7303K c/s real, 3776K c/s virtual Only one salt: 6784K c/s real, 3486K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john4 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 7336K c/s real, 3762K c/s virtual Only one salt: 6781K c/s real, 3473K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john8 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 7325K c/s real, 3756K c/s virtual Only one salt: 6749K c/s real, 3454K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john16 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 7284K c/s real, 3743K c/s virtual Only one salt: 6399K c/s real, 3275K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john32 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 7018K c/s real, 3595K c/s virtual Only one salt: 5618K c/s real, 2869K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john64 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 6740K c/s real, 3453K c/s virtual Only one salt: 5403K c/s real, 2759K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john128 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 6684K c/s real, 3424K c/s virtual Only one salt: 5441K c/s real, 2787K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john256 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 6671K c/s real, 3449K c/s virtual Only one salt: 5520K c/s real, 2819K c/s virtual OMP_NUM_THREADS=1 with Celeron E3200 @ 3.66 GHz linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john1 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 3881K c/s real, 3881K c/s virtual Only one salt: 3662K c/s real, 3662K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john4 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 3881K c/s real, 3881K c/s virtual Only one salt: 3676K c/s real, 3676K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john8 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 3843K c/s real, 3851K c/s virtual Only one salt: 3644K c/s real, 3644K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john16 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 3809K c/s real, 3809K c/s virtual Only one salt: 3569K c/s real, 3569K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john32 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 3795K c/s real, 3795K c/s virtual Only one salt: 3446K c/s real, 3446K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john64 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 3697K c/s real, 3690K c/s virtual Only one salt: 3234K c/s real, 3234K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john128 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 3624K c/s real, 3624K c/s virtual Only one salt: 3152K c/s real, 3152K c/s virtual linux-1mo8:~/Documents/john-1.7.9/run # nice --10 ./john256 -test Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 3617K c/s real, 3610K c/s virtual Only one salt: 3152K c/s real, 3145K c/s virtual ------- All in all "4" seems to be the best performing option for both 1 thread and 2 threads. "32" seems to be 100k slower for many salts and 200k slower for one salt with one thread "32" seems to be 300k slower for many salts and 1000k slower for one salt with two threads It's unfortunate I can't test "4" with the ICC (expired demo license)... If similar improvements are possible from 32 => 4, then ICC would give figures of over 9200k @ 4 GHz with many salts. Now, the thing is I *thought* that the +150k that the icc openMP version had over the non-openMP icc / no-asm version* (~4600k vs ~4400k) was directly related to this tweak. So I speculated "if this increases speed, I'll up the number and get more". Apparently that was not the case, since the "32" default value actually slows down the program for me and steals performance relative to a value of "4"... So maybe the extra buffered stuff overflows my 1MB l2 cache per core - which reduces speed (???). Anyway, if the 32 value is actually a slowing factor (for me), then there's some other difference in the openMP version that not only covers this performance loss of ~100-200k (from the 32X relative to 1X of the non-openMP version), but also compensates with a +300k - making the openMP version (1 thread) reach speeds of +200k over the non-openMP version. This means that there are some other factors which make openMP (thread=1) faster which may be worth investigating and replicating in the non-omp version. You are the programmer so you know them best :D * http://www.openwall.com/lists/john-users/2012/01/21/7
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.