john-dev - Re: 5x intrinsics?

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAKtfLcu4ZRPS_8DPXhh=J4OEcN2Mm2CScK6k77L94i0yKahcMA@mail.gmail.com>
Date: Tue, 21 May 2013 18:32:48 -0400
From: Alain Espinosa <alainesp@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: 5x intrinsics?

On 5/21/13, magnum <john.magnum@...hmail.com> wrote:
> I see Alain's NT format is "5x" for 32-bit SSE2 builds, ie. it does 4x in
> SSE2 plus 1x in non-SSE. I presume these are interleaved for hiding latency
> so doing that extra 1x more or less for free. Would this be theoretically
> and practically worthwhile for the intrinsics? Maybe it'd just get very
> messy. I can't remember any discussion on this matter...

In my testing with a Pentium 4 this have a very small speedup. With
faster SSE engines (beginning with Core 2 Duo) the 32 bits
implementation 'probably' will be slower than a SSE2 only
implementation. In 64 bits we interleave 2 SSE2 (2*4x) that will
result in a good speed-up. I try a 3*4x SSE2 implementation there
wasn't any performance gain (i try this with Core 2 Duos). Again, with
more vector ports in recent CPUs we may test this again. An improve
over the 64 bits SSE2 implementation is the use of non-destructive
source with AVX. Also to consider with upcoming Intel CPUs is an AVX2
implementation with 4*8x (using non-destructive source and some
temporal memory use for rotating). Probably will provide a speedup
given that the CPUs have more ports and better memory engine.

saludos,
alain

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.