Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200430145732.GA14590@openwall.com>
Date: Thu, 30 Apr 2020 16:57:32 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Reduce CPU load. Is it possible?

First JtR-focused part:

On Thu, Apr 30, 2020 at 02:48:40PM +0100, Darren Wise wrote:
> Obviously I'm assuming here whatever CPU you are utilising can multi thread per CPU core too.

When an operating system reports a certain number of "CPUs", those are
generally the hardware threads, not only physical cores.  So when you
have a CPU with more than one hardware thread per core, the reported
number of "CPUs" is already inflated accordingly.  This means that what
oayz is doing is over-loading the chosen logical CPUs (maybe cores,
maybe hardware threads) with multiple software threads each, resulting
in the OS having to switch context back and forth and the OpenMP threads
waiting for each other.  This is highly non-optimal.  While oayz may see
a 50%'ish total CPU load as expected when using 2 out of 4 logical CPUs,
this probably corresponds to significantly less than the full
performance of those two logical CPUs, with much of their performance
wasted on context switches and thread synchronization.  The OS-reported
CPU utilization doesn't distinguish useful computation from overhead -
it counts both.  So oayz's suggestion is a wrong thing to do, no matter
how many hardware threads there may be in a given CPU per core.

Then general thoughts on CPU design:

> Each core can run two threads I assume, four actual CPU cores equates to eight threads? HT? HyperThreading is active. However I even have a SUNmicrosystem CPU which is quadcore and runs 16 threads, four per core..

The vendor-neutral name for this is SMT, or simultaneous multithreading:

https://en.wikipedia.org/wiki/Simultaneous_multithreading

Like you say, this varies.  On commodity x86 CPUs, it's currently 1 or 2
hardware threads per core.  On Xeon Phi (which is x86, too) it's 4.  On
SPARC, it's 1 to 8.  On POWER, it's also 1 to 8.

CPUs where the hardware thread count per core is above 2 are typically
physically unable to fully use the core from just 1 thread.  Those with
2 hardware threads per core are typically capable of fully using the
core from 1 thread as well, but only if the application contains enough
natural parallelism per thread and the code exposes that parallelism
(this same requirement exists for CPUs lacking SMT as well).

> Wish Intel mastered that one.

This is trivial for Intel to "master", like they already did in Xeon
Phi, but the optimal number of hardware threads to support per core is a
function of the rest of the CPU design - maximize single thread
performance (desktop CPUs) or maximize multi-threaded throughput at the
expense of worse single thread performance (HPC and server CPUs).  The
worse single thread performance would even apply when running multiple
threads per CPU chip, just not so many to start making much use of SMT.

There are also security considerations.  Many x86 systems that
physically support 2 hardware threads per core are now running with SMT
disabled because of security risks associated with it.  If the CPUs
supported 4+ hardware threads per core along with other associated
design changes, and thus required 2+ threads to fully use a core, the
performance impact from such configuration would be greater.

There are also performance inefficiencies when tasks happen to be
scheduled to run on different hardware threads of the same cores instead
of on different cores, while leaving some cores unused.  While modern OS
schedulers are SMT-aware and generally prevent this problem, it still
sometimes reoccurs, such as in VMs where the guest system's kernel might
not have sufficient exposure to the host's CPU topology.

See also my older explanation of the rationale for SMT and what (not) to
expect from it:

https://www.openwall.com/lists/john-dev/2015/09/12/2

Overall, despite of all drawbacks of SMT, I like having it for highly
parallelizable workloads such as what we have in JtR.  However, I do not
blindly wish to have more than 2 threads per core on desktop.

Be careful what you wish for.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.