|
Message-ID: <AANLkTinT97RWwL0V5X8hm-rMWkh4cSgDS9ewyMEHtt+_@mail.gmail.com> Date: Tue, 1 Feb 2011 13:59:30 +0200 From: Milen Rangelov <gat3way@...il.com> To: john-users@...ts.openwall.com Subject: Re: FreeBSD crypt() / MD5-crypt implementation question Hello, I have experience with OpenCL mostly on ATI cards. Yes, the compiler is very good at optimizing code, yet there are of course many things that can be done in order to help it do better. Some of them are documented in the corresponding ATI/Nvidia OpenCL guides, some are not. For example, for some reason rotate() function on ATI generates fast bitalign code, while a leftrotate macro (with shifts/or) do not. Some code generates slowpath accesses, other not and that's not documented well. Also, sometimes some quite weird changes produce much better code and you're left without explanation to this (OK, the dumped ISA code might give you some insight). OTOH I had many issues trying to run the same code on NVidia, that's why I split kernels in two - AMD version and NVidia one. Sometimes the Nvidia compiler even crashed during clBuildProgram() because of some construction that is pretty legal in ATI's implementation. Basically though you can write code that runs on both platforms, however it won't perform well. There are tweaks that need to be done for ATI, and for NVidia, related to global memory accesses and vectorization mostly. Most of them are documented in the guides. On your platform probably Barswf would be faster because it does the hash reversal trick and skips up to the 48th step. However, it is not capable of doing multi-hash and mask attacks unlike oclhc. IMHO, oclhashcat is the best cracker currently available as it offers a very good tradeoff between performance and functionality. There might be faster single-hash crackers, but they can't offer that rich set of features. On Tue, Feb 1, 2011 at 1:27 PM, Freddie Witherden <freddie@...herden.org>wrote: > That's interesting. Do you have any experience with the higher-level > languages/compilers (CUDA C/OpenCL) and how they perform? I ask as x86 > compilers are generally quite good at spotting and optimising bit > manipulations (endian swapping macro => bswap; "two-shitfs, one or" => > ror). It would indeed be nice if a single OpenCL kernel could take care > of current and future AMD/Nvidia hardware without needing to hand-tune > code for different ISA's. > > I've looked at a few CUDA MD5 implementations (although not MD5 crypt, > just raw MD5) with the performance on my 295 GTX varying from ~100 > ("CUDA MD5", Mario Juric, GPL v2) Mhash/s to ~600 Mhash/s ("oclHashcat", > blob). "MD5 Crack GPU", LGPL v3, will do ~400 Mhash/s and I am yet to > benchmark BarsWF. > > Polemically yours, Freddie. > >
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.