john-users - Re: Performance tuning

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20060427220443.GA18501@openwall.com>
Date: Fri, 28 Apr 2006 02:04:43 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Performance tuning

Speaking of MMX vs. x86-64 SSE registers:

On Thu, Apr 27, 2006 at 11:21:03PM +0200, sebastian.rother@...erlin.de wrote:
> So how can 8 64Bit registers outperform 16 128Bit Registers?!

It's not registers which "perform".  There are x86/MMX or x86-64/SSE
instructions which are translated into one or more micro-ops.  Some of
those micro-ops may have latencies of greater than 1 cycle.  Both
micro-op counts and their latencies might differ for micro-ops generated
for x86/MMX vs. x86-64/SSE.  That's the theory - to answer your question
("how can it be true").

However, I've based my brief analysis primarily on the actual benchmarks
I had performed.  According to those benchmarks, MMX bitwise ops deliver
better performance per-bit than SSE ones do, despite SSE registers being
twice wider, on Pentium 3 and on AMD processors - but SSE is actually
somewhat faster than MMX per-bit on Pentium 4 processors.  In other
words, SSE instructions perform more than twice slower than MMX ones do
on P3 and AMD, but less than twice slower on P4.  Of course, this may
change with future processors of either or both vendors.

> Related to the Co-Processors:

Sebastian, Frank - thank you for the links.  I'll have a look a bit
later and comment in here if appropriate.

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598
http://www.openwall.com - bringing security into open computing environments

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.