|
Message-ID: <CA+E3k91W3wmN+Obh6ZKtGYM_updgPM6mzQJjy5Bz=2zZ+td6dg@mail.gmail.com> Date: Thu, 19 Feb 2015 22:02:04 -0900 From: Royce Williams <royce@...ho.org> To: john-dev <john-dev@...ts.openwall.com> Subject: Re: descrypt speed On Thu, Feb 19, 2015 at 9:58 AM, magnum <john.magnum@...hmail.com> wrote: > On 2015-02-19 08:30, Royce Williams wrote: >> ... and fork=8 (more processes starved for CPU, but more aggregate throughput): >> >> 5 0g 0:00:02:20 0.00% 3/3 (ETA: 2016-08-08 08:33) 0g/s 18030Kp/s >> 18030Kc/s 18030KC/s GPU:39°C fan:45% 2d2inl1n..2d2ottrd >> 1 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-01-29 00:52) 0g/s 28015Kp/s >> 28015Kc/s 28015KC/s GPU:46°C fan:45% 03-9be32..03alus42 >> 4 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-03-02 19:39) 0g/s 25572Kp/s >> 25572Kc/s 25572KC/s GPU:32°C fan:45% plzzgm1...plp2b3sk >> 3 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-01-23 10:53) 0g/s 28654Kp/s >> 28654Kc/s 28654KC/s GPU:33°C fan:45% 8c9gt7i..8cci13k >> 6 0g 0:00:02:20 0.00% 3/3 (ETA: 2016-09-10 02:55) 0g/s 16992Kp/s >> 16992Kc/s 16992KC/s GPU:39°C fan:45% kmk14en8..kmher2a3 >> 7 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-01-26 16:56) 0g/s 28266Kp/s >> 28266Kc/s 28266KC/s GPU:46°C fan:45% lhgeh730..l0nn0wow >> 8 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-02-13 21:08) 0g/s 26841Kp/s >> 26841Kc/s 26841KC/s GPU:41°C fan:45% cl1kiylu..clrh2bl1 >> 2 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-03-02 11:01) 0g/s 25565Kp/s >> 25565Kc/s 25565KC/s GPU:41°C fan:45% do_7af3..di7z7h8 >> >> (Aggregate: 197935Kp/s) > > Unless I misunderstand, this does not make sense. If you overbook six > GPU's using --fork=8, the two "extra" processes will each be pegged to a > GPU, just like the first six. So it will end up in 2 GPUs running 2 > processes each, and 4 GPUs running one each. In case of CPU it would > have worked fine (no affinity). > > Bottom line is you probably want to use either --fork=6 or --fork=12. Agreed ... given equal PCI bandwidth. :-/ I jumped to the wrong conclusion, due to forgotten PCI bandwidth differences on this system. Four of my cards are connected to x16 slots with riser cables, and the other two cards are connected to x1 slots. (I hadn't thought about this for a while because I've been mostly using hashcat for the past couple of months.) After your message, I was curious, so I ran new single-hash tests for fork=6 through fork=13, each for at least ten minutes, and then added up the aggregate speeds: 6: 168712 Kp/s 7: 181884 Kp/s 8: 195109 Kp/s 9: 205072 Kp/s 10: 210777 Kp/s 11: 188349 Kp/s 12: 176034 Kp/s 13: 197104 Kp/s 14: 198222 Kp/s 15: 187335 Kp/s I had previously stopped at 8 because 9 showed no appreciable improvement at the time for unknown reasons. I naively presumed a simple threshold and did not double-check. So fork=10 is the sweet spot on my system - two processes for the four x16 cards, and one process for the x1 cards. More than that and performance is never as good as at fork=10. Thanks for helping me to clarify. Side question: Can I either tell JtR to automatically exit after X minutes, or invoke --test against all GPUs? Either would make tests like these simpler. Royce
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.