|
Message-ID: <51488165.4050507@gmail.com> Date: Tue, 19 Mar 2013 12:16:53 -0300 From: Claudio André <claudioandre.br@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Idea to increase plaintext length for GPU based hashes Em 18-03-2013 22:21, magnum escreveu: > Another approach (not necessarily mutex to yours) would be to split the transfer. Let's say we have a work size of 1M. At, say, the 256K'th call to set_key(), it could initiate a transfer of this first fourth of keys to GPU. This transfer will not stall the host side, it will take place while we continue with the next 256K keys. And so on. If we can balance this properly we should get rid of much of the transfer delay. Maybe we should split it in 8 or 16, maybe less. Well, it is easy to (somehow) implement your idea. Good gain. -- From: (unstable)$ ../run/john -fo:raw-sha256-opencl -t OpenCL platform 0: AMD Accelerated Parallel Processing, 2 device(s). Device 0: Juniper (AMD Radeon HD 6700 Series) Max local worksize 256, Optimal local worksize 128 (to avoid this test on next run, put "rawsha256_LWS = 128" in john.conf, section [Options:OpenCL]) Local worksize (LWS) 128, global worksize (GWS) 1310720 Benchmarking: Raw SHA-256 (pwlen < 32) [OpenCL (inefficient, development use mostly)]... DONE Raw: 13653K c/s real, 47953K c/s virtual -- To: (bleeding)$ ../run/john -fo:raw-sha256-opencl -t Device 0: Juniper (AMD Radeon HD 6700 Series) Local worksize (LWS) 128, global worksize (GWS) 1310720 Benchmarking: Raw SHA-256 (pwlen < 32) [OpenCL (inefficient, development use mostly)]... DONE Raw: 17096K c/s real, 36408K c/s virtual $ ../run/john -fo:cisco4-opencl -t Device 0: Juniper (AMD Radeon HD 6700 Series) Local worksize (LWS) 128, global worksize (GWS) 1310720 Benchmarking: Cisco "type 4" hashes SHA-256 [OpenCL (inefficient, development use mostly)]... DONE Raw: 17554K c/s real, 39321K c/s virtual ------ But, this not enought to do the trick. Below, I measure only the GPU part in auto-tune. I expect to be at 113M not 17M. Is it only set_key() to blame? Btw: auto-tune on unstable goes at 19M. $ GWS=0 STEP= DETAILS= ../run/john -fo:cisco4-opencl -t Device 0: Juniper (AMD Radeon HD 6700 Series) Calculating best global worksize (GWS) for LWS=128 and max. 1.0 s duration. Raw speed figures including buffer transfers: pass xfer: 0.26 ms, crypt: 0.29 ms, result xfer: 0.66 ms gws: 65536 54107339 c/s 1.211 ms per crypt_all()+ pass xfer: 0.49 ms, crypt: 0.51 ms, result xfer: 0.79 ms gws: 131072 73165517 c/s 1.791 ms per crypt_all()+ pass xfer: 0.10 ms, crypt: 1.00 ms, result xfer: 1.40 ms gws: 262144 104964874 c/s 2.497 ms per crypt_all()+ pass xfer: 0.24 ms, crypt: 1.98 ms, result xfer: 2.60 ms gws: 524288 108720820 c/s 4.822 ms per crypt_all()+ pass xfer: 0.39 ms, crypt: 3.94 ms, result xfer: 4.96 ms gws: 1048576 112859326 c/s 9.291 ms per crypt_all()+ pass xfer: 0.76 ms, crypt: 7.87 ms, result xfer: 9.91 ms gws: 2097152 113157718 c/s 18.533 ms per crypt_all() Optimal global worksize 1048576 (to avoid this test on next run, put "rawsha256_GWS = 1048576" in john.conf, section [Options:OpenCL]) Local worksize (LWS) 128, global worksize (GWS) 1048576 Benchmarking: Cisco "type 4" hashes SHA-256 [OpenCL (inefficient, development use mostly)]... DONE Raw: 16930K c/s real, 35720K c/s virtual Claudio
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.