Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <50A23CDF.6010802@gmail.com>
Date: Tue, 13 Nov 2012 10:28:15 -0200
From: Claudio André <claudioandre.br@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Clear keys

Em 12-11-2012 16:50, magnum escreveu:
> On 12 Nov, 2012, at 13:33 , Claudio André <claudioandre.br@...il.com 
> <mailto:claudioandre.br@...il.com>> wrote:
>> Hi, i tried to use clear_keys (as in commitba10ced)
>>
>> But it hurts performance. Did i understand anything wrong?
>
> Probably not. I was going to bring this up too. Many parameters are 
> involved.
>
> I did this for ntlmv2-opencl as it's running at speeds where this has 
> significance (currently ~ 18M c/s at best).  My original code did a 
> 64-byte blind memset per key, first thing in set_key(). I changed it 
> to memsetting the whole array at once in clear_keys() and it got a 
> little faster on all hardware I have tested. But not a lot faster. I 
> guess the original method makes better use of the cache (after that 
> memset, cache is warm for the succeding uincode translation) while the 
> huge memset is more effective in terms of SSE/AVX/etc optimized code.
>
> Anyway, that is a whopping 128 MB memset when running on the 7970 and 
> normally less than 1/3 of each key will actually be dirty. So I 
> thought there ought to be a better way. Just for the hell of it I 
> tried making a trivial clear_keys kernel but no matter how fast that 
> is, the transfer time makes it useless (unless I start juggling with 
> two buffers).
>
> Then I moved the clearing back to set_key() - but in a way that does a 
> 32-bit word at a time and that stops when it reaches already clear 
> memory. This is faster on most gear I've tried (including Bull w/ 
> GTX570) but has a huge impact on the Tahiti (a.k.a 7970) that 
> I really can't explain (speed drops to like 25% of original blind 
> memset) despite it will normally not have to clean more than one word 
> - if that. So for now I settled for the clear_keys() memset. I guess I 
> could adopt to hardware and chose between these methods at runtime.
>
> It could be that the self test vectors happen to produce worse 
> results (very long keys mixed with shorter) with that last experiment 
> than what would be the normal case in real life. I might do more 
> experiments later.
>
> magnum

I will Keep watching.

Thanks.

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.