|
Message-ID: <CA+E3k93=eCaFt1XB7K7ZVABcsWjKLEicSMjhm9hiGvEwC+kqJA@mail.gmail.com> Date: Wed, 18 Feb 2015 22:30:50 -0900 From: Royce Williams <royce@...ho.org> To: john-dev <john-dev@...ts.openwall.com> Subject: Re: descrypt speed On Wed, Feb 18, 2015 at 10:08 PM, Sayantan Datta <std2048@...il.com> wrote: > > > On Thu, Feb 19, 2015 at 11:59 AM, Sayantan Datta <std2048@...il.com> wrote: >> >> >> >> On Mon, Nov 3, 2014 at 3:32 AM, Royce Williams <royce@...ho.org> wrote: >>> >>> On Sun, Nov 2, 2014 at 12:19 PM, magnum <john.magnum@...hmail.com> wrote: >>>> >>>> On 2014-11-02 18:59, Royce Williams wrote: >>>>> >>>>> On Thu, Oct 30, 2014 at 9:33 PM, magnum <john.magnum@...hmail.com> >>>>> wrote: >>>>>> >>>>>> On 2014-10-31 06:02, Royce Williams wrote: >>>>>>> >>>>>>> On a GTX970, shouldn't this be sm_52? >>>>>> >>>>>> >>>>>> You can force this by editing NVCC_FLAGS in Makefile. Add something >>>>>> like >>>>>> "-arch sm_50" (or 52). But I doubt it will make much difference and it >>>>>> will >>>>>> only affect CUDA formats. >>>>> >>>>> >>>>> In my system with both an sm_20 and an sm_50 card, when running solely >>>>> descrypt-opencl (not CUDA), the ptxas info shows that sm_50 is involved >>>>> in >>>>> some way. Is this cosmetic? >>>> >>>> >>>> OpenCL compiles a suitable (different) kernel for each and you do not >>>> have to configure anything. >>> >>> >>> What's giving me pause is that without changing anything on either >>> system, descrypt-opencl is appropriately using sm_20 and sm_50 on my >>> heterogeneous system, but is only using sm_20 on my GTX750 system. >>> Previously, the latter system was happily using sm_52. I am not sure what >>> changed. >>> >>>> >>>> You can configure CUDA for compiling several archs at once, see "nvcc >>>> --help". It something like "-gencode arch=compute_20,code=sm_20 -gencode >>>> arch=compute_50,code=sm_50" (added to NVCC_FLAGS instead of just -arch >>>> sm_xx). The one most suitable of them will be picked at runtime. >>> >>> >>> Interesting -- I'll try that. >>> >>> Royce >> >> >> Hi Royce, magnum, >> >> If you are interested, you can test the new revision of descrypt-opencl on >> 970, 980 and 290X. There are three kernels and you can select them by >> changing the parameters HARDCODE_SALT and FULL_UNROLL in >> opencl_DES_hst_dev_shared.h. Setting (1,1) gives you the fastest kernel but >> takes very long to compile, however subsequent runs should compile much >> quicker as pre-compiled kernels(saved to the disk from the prior runs) are >> used. Setting (1,0) gives slower speed but faster compilation time. Setting >> (0,0) is the slowest but compilation is quickest. Also do not fork on same >> system when HARDCODE_SALT is 1. >> >> Regards, >> Sayantan > > > Actually, fork may be used with HARDCODE_SALT =1 but at most 2 threads, > anything more than that is wasteful and you may need ton of RAM. Even with > --fork == 2, I think you should have at least 8GB RAM. Another problem we > currently have when using fork is that kernels are compiled n times for n > threads which is unnecessary. However we can trick that by using --fork=1 to > compile all kernels and then restart using --fork=2. > > Some performance Numbers using --fork = 2, HARCODE_SALT=1, FULL_UNROLL=1, > 124 passwords and 122 salts, GPU: 7970(925Mhz core, 1375Mhz memory) > > 2 0g 0:00:05:07 3/3 0g/s 749774p/s 91400Kc/s 92900KC/s GPU:61°C util:97% > fan:27% scprugas..myremy26 > 1 0g 0:00:05:07 3/3 0g/s 749756p/s 91398Kc/s 92898KC/s GPU:61°C util:97% > fan:27% 339gmh..8jfu44 > > Performance with --fork=1 > 0g 0:00:04:25 3/3 0g/s 1324Kp/s 161247Kc/s 163891KC/s GPU:60°C util:87% > fan:27% srusuu..07pvjy Thanks for the opportunity to test! Here are my results of "--test --format=descrypt-opencl" for a GTX 970 SC (factory overclocked to 1316 MHz): First, a baseline - performance using magnumripper from a couple of months ago: Many salts: 46137K c/s real, 45680K c/s virtual Only one salt: 25700K c/s real, 25700K c/s virtual Using fb0b9383d6 magnumripper from today, for (HARDCODE_SALT,FULL_UNROLL) values: (0,0) Many salts: 77345K c/s real, 77345K c/s virtual Only one salt: 35298K c/s real, 35298K c/s virtual (1,0) Many salts: 77864K c/s real, 78643K c/s virtual Only one salt: 34952K c/s real, 34952K c/s virtual (1,1) Many salts: 169869K c/s real, 169869K c/s virtual Only one salt: 47710K c/s real, 48192K c/s virtual (That's quite a jump. Not knowing any better, is the many-salts value really supposed to be that high?) Here is real-world performance on a single card against a single hash, no fork, after ~10 minutes: 0g 0:00:10:44 0.00% 3/3 (ETA: 2020-09-07 01:11) 0g/s 38282Kp/s 38282Kc/s 38282KC/s GPU:41°C fan:45% etyc45x..euamdhj ... and fork=6 (one core per identical GPU), for what it's worth (this seemed to work fine on my 16GB system): 4 0g 0:00:04:16 0.00% 3/3 (ETA: 2016-04-19 02:58) 0g/s 30421Kp/s 30421Kc/s 30421KC/s GPU:34°C fan:45% mmjhj31j..mmrrpdly 2 0g 0:00:04:16 0.00% 3/3 (ETA: 2016-04-06 22:04) 0g/s 31320Kp/s 31320Kc/s 31320KC/s GPU:40°C fan:45% hlc8466*..hllhikko 5 0g 0:00:04:16 0.00% 3/3 (ETA: 2016-11-25 12:35) 0g/s 20033Kp/s 20033Kc/s 20033KC/s GPU:39°C fan:45% 9dzjt0e..9/1bb9m 3 0g 0:00:04:16 0.00% 3/3 (ETA: 2016-04-01 17:17) 0g/s 31719Kp/s 31719Kc/s 31719KC/s GPU:33°C fan:45% nrnUQp..n2j=h! 1 0g 0:00:04:16 0.00% 3/3 (ETA: 2016-03-26 11:21) 0g/s 32213Kp/s 32213Kc/s 32213KC/s GPU:41°C fan:45% bs9ntql..byisi7a 6 0g 0:00:04:16 0.00% 3/3 (ETA: 2017-01-02 14:46) 0g/s 18917Kp/s 18917Kc/s 18917KC/s GPU:40°C fan:45% agb_co6..azo52r2 (Aggregate: 164413Kp/s) ... and fork=8 (more processes starved for CPU, but more aggregate throughput): 5 0g 0:00:02:20 0.00% 3/3 (ETA: 2016-08-08 08:33) 0g/s 18030Kp/s 18030Kc/s 18030KC/s GPU:39°C fan:45% 2d2inl1n..2d2ottrd 1 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-01-29 00:52) 0g/s 28015Kp/s 28015Kc/s 28015KC/s GPU:46°C fan:45% 03-9be32..03alus42 4 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-03-02 19:39) 0g/s 25572Kp/s 25572Kc/s 25572KC/s GPU:32°C fan:45% plzzgm1...plp2b3sk 3 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-01-23 10:53) 0g/s 28654Kp/s 28654Kc/s 28654KC/s GPU:33°C fan:45% 8c9gt7i..8cci13k 6 0g 0:00:02:20 0.00% 3/3 (ETA: 2016-09-10 02:55) 0g/s 16992Kp/s 16992Kc/s 16992KC/s GPU:39°C fan:45% kmk14en8..kmher2a3 7 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-01-26 16:56) 0g/s 28266Kp/s 28266Kc/s 28266KC/s GPU:46°C fan:45% lhgeh730..l0nn0wow 8 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-02-13 21:08) 0g/s 26841Kp/s 26841Kc/s 26841KC/s GPU:41°C fan:45% cl1kiylu..clrh2bl1 2 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-03-02 11:01) 0g/s 25565Kp/s 25565Kc/s 25565KC/s GPU:41°C fan:45% do_7af3..di7z7h8 (Aggregate: 197935Kp/s) And ignore the identical fan speeds; I have them all locked at 45% right now. Royce
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.