Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110223020816.GA22205@openwall.com>
Date: Wed, 23 Feb 2011 05:08:16 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: bitslice DES on AVX

On Wed, Feb 23, 2011 at 04:13:42AM +0300, Solar Designer wrote:
> Now I need to figure out how to make DES_BS 3 work.

Well, _mm256_blendv_ps() was a wrong intrinsic for what I meant to use.
The correct one could be _mm256_cmov_ps(), but it is not recognized by
my gcc 4.5.0 for whatever reason.  I was able to get the desired vpcmov
instructions generated by using -mxop and __builtin_ia32_vpcmov_v8sf256().

Bad news: I was wrong in thinking that Intel has since "imported" this
stuff into AVX.  Apparently, they did not, and it's XOP-only, hopefully
to be found in AMD's CPUs to be made available later this year.

Benchmarking: Traditional DES [256/256 BS AVX]... Illegal instruction at address = 408e27: 8f 48 4c a2 c5 40 c5 7c 29 84 24 e8 05 00 00

solar@owl:~/john/john-1.7.6-avx/src $ objdump -d ../run/john | fgrep -w 408e27
  408e27:       8f 48 4c a2 c5 40       vpcmov %ymm4,%ymm13,%ymm6,%ymm8

At least it's not in the emulator.  I guess Sandy Bridge CPUs are similar.

So we're stuck with DES_BS 1 for now, but we may use 256-bit vectors,
which may or may not be faster than 128-bit (depends on how current CPUs
implement them).  If they're not faster than 128-bit, we may still
benefit from AVX' support for 3-operand instructions, so this is
something to be benchmarked.  That is, benchmark not just 256-bit AVX
vs. SSE2, but also include 128-bit AVX in the comparison.

AVX+SSE2 at once (384-bit virtual vectors) probably makes little sense
since they will have to share the 16 registers.  It's like 8 registers
per implementation.  Yet it's worth trying if the compiler manages to
allocate registers to each implementation in a non-conflicting fashion.

AVX+MMX (320-bit) might turn out to work better - at least the registers
are separate.  This might be worth trying too.

And it looks like we'll need to support XOP separately from AVX, so the
various combinations with XOP will need to be tried too...

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.