|
Message-ID: <CABh=JREB1GyqUrPZxSTSQhcr0mkmJa6F4GhOvLFmKsMRwii3MA@mail.gmail.com>
Date: Thu, 15 Mar 2012 23:55:59 +0200
From: Milen Rangelov <gat3way@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: AMD Bulldozer and XOP
>
> I think you mean SSE and Pentium 3. Yes, that was disappointing. In
> fact, the cause might be similar: officially, those wider registers and
> operations on them are "floating point" (true for both the original SSE
> and now for 256-bit AVX and XOP), so there might be some overhead on
> updating some CPU-internal floating-point state (flags reflecting the
> current values in the vector elements if interpreted as floating-point?)
> That's just a guess, though.
>
>
Hm, but they support bitwise operations like shifts, and/or/xor and stuff.
I was wondering if it make sense to use that in say a MD5 routine. Load two
xmms into an ymm to do bitwise operations, then unload them for the
additions, then load them again for the next bitwise operations and so on.
Perhaps that's a very stupid idea and I really doubt it would work, but who
knows. I've done something like that with my SHA512 kernel, though the case
was very different there. GPUs have no native 64-bit operations even though
OpenCL standard defines 64-bit long types, all 64-bit arithmetics emulated
in software. However the AMD compiler generated horrible code sometimes and
it turned beneficial to cast a ulong to uint vector and do the operation on
uints. Of course the xmm/ymm case is much different and I don't know if
it's applicable here at all.
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.