john-dev - Re: AMD Bulldozer and XOP

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120315231701.GA9928@openwall.com>
Date: Fri, 16 Mar 2012 03:17:01 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: AMD Bulldozer and XOP

On Thu, Mar 15, 2012 at 11:55:59PM +0200, Milen Rangelov wrote:
> Hm, but they support bitwise operations like shifts, and/or/xor and stuff.

Pentium 3's SSE (KNI) had bitwise ops as well, but they were awfully
slow (approx. 1.5 times slower than MMX per bit).

> I was wondering if it make sense to use that in say a MD5 routine. Load two
> xmms into an ymm to do bitwise operations, then unload them for the
> additions, then load them again for the next bitwise operations and so on.

On current CPUs, this will be slower than staying 128-bit only.
In fact, even merely doing the 256-bit bitwise ops without any
loading/unloading is not going to provide you any speedup on Sandy
Bridge (according to my own benchmarks) and is going to slow you down by
a factor of two on Bulldozer (according to benchmarks that were sent to
me; I have yet to verify this myself).

Exception: in 32-bit mode where you only have 8 registers and would
incur data dependency stalls because of that, the 256-bit ops may be of
a little bit of help on Sandy Bridge (like +5%), but not on Bulldozer.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.