|
Message-ID: <20201203182807.GA2462@openwall.com> Date: Thu, 3 Dec 2020 19:28:07 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: dmg-opencl low performance/ low gpu utilisation Hi, On Thu, Dec 03, 2020 at 03:26:27PM +0100, r.wiesbach@....de wrote: > I use dmg-opencl on a two Radeon RX 580 system. > > However the dmg-opencl has very low utilisation How low? And how do you measure it? > and a speed of only about 2500 pw/s. This may be a fine speed. It depends on performance of the system the dmg file or sparsebundle was created on - the faster that system was, the slower the file or sparsebundle will be to crack. This is because recent versions of macOS tune the time needed to generate the encryption key from the password to be roughly the same - whatever the developers thought the user would not be too comfortable with. > --test > > shows Raw 46500 c/s real and 3300 c/s virtual A default "--test" benchmark of dmg-opencl is for: Speed for cost 1 (iteration count) of 1000, cost 2 (version) of 2 and 1 This is kind of nominal - useful to compare builds of JtR and different hardware, but not so useful to predict performance on real input. Your actual input is almost certainly version 2 only, and it probably has something like 100,000 iterations, affecting the speed accordingly. That said, "46500 c/s real" does sound low, and "3300 c/s virtual" weird (the virtual is generally the same or higher than real for this test, because only one CPU thread is run and the virtual time runs slower than real). Here's what I am getting for a Vega 64 under Linux: Device 1: gfx900 [Radeon RX Vega] Benchmarking: dmg-opencl, Apple DMG [PBKDF2-SHA1 3DES/AES OpenCL]... LWS=64 GWS=32768 (512 blocks) DONE Speed for cost 1 (iteration count) of 1000, cost 2 (version) of 2 and 1 Raw: 875976 c/s real, 11796K c/s virtual Once again, the speeds are nominal, and speeds of a couple of orders of magnitude lower during actual cracking of a dmg file or sparsebundle produced by a non-ancient version of macOS are expected. > Device 2 (same GPU model as device 1) seems not to be used by default, That's correct. > but using > --devices=1,2 --fork=2 > there is at most a slight increase in performance (not doubling the pw/s > as one would expect) When you run with two devices, there should be two status lines printed for every keypress. These correspond to the two devices, separately. So it is expected that the performance reported on every one line will not increase, but the cumulative performance for the two lines will be double what you had when using just one device. You might want to post an excerpt from your terminal window starting with the "Loaded ..." line for us to see if it's reasonable or not. > Knowing that some opencl-kernels do not perform well (without rules) I tried > --wordlist=wordlist.txt > --wordlist=wordlist.txt --rules > --wordlist=wordlist.txt --rules=best64 > --incremental We have no OpenCL kernels that would perform better with rules. We do have some that will perform better with mask, but those are for so-called "fast hashes". dmg-opencl is (more than) slow enough not to need this (and thus doesn't include this unneeded optimization). > Additionally i tried using more hashes (5) and the pw/s droped to about > 500p/w. As this is 1/5th of 2500 this is the same speed This looks correct, as long as the hashes all have different salts. > but still low utilization. Again, how low? And how do you know? > The wordlist has a size of about 100MB. At (expected) low speeds like there are for this format, things like cracking mode and wordlist size shouldn't make much of a difference in the resulting p/s rate. > I did not see an open issue for dmg-opencl on the isssue tracker We're not currently aware of performance issues with dmg-opencl, and in my personal experience it works well - but I don't use Windows. It is quite possible there's an issue - maybe a Windows-specific one, maybe e.g. with auto-tuning of OpenCL work sizes - just guessing here. Let's see what you actually have (some lines where JtR reports on the loaded hashes, their tunable costs, the tuned LWS and GWS figures, the resulting speeds after it's been running for a while) - then determine if anything is wrong with that, what exactly, and how it can be fixed. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.