john-users - Re: Performance John in the cloud

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200816025425.GA22555@openwall.com>
Date: Sun, 16 Aug 2020 04:54:26 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Performance John in the cloud

On Sat, Aug 15, 2020 at 11:06:13PM +0200, Solar Designer wrote:
> on a c5a.24xlarge instance (96 vCPUs, AMD EPYC 7R32)

BTW, here are some other benchmarks on that CPU, 96 threads:

Benchmarking: descrypt, traditional crypt(3) [DES 256/256 AVX2]... (96xOMP) DONE
Many salts:     407961K c/s real, 4254K c/s virtual
Only one salt:  62797K c/s real, 654782 c/s virtual

Benchmarking: md5crypt, crypt(3) $1$ (and variants) [MD5 256/256 AVX2 8x3]... (96xOMP) DONE
Many salts:     4608K c/s real, 48002 c/s virtual
Only one salt:  3801K c/s real, 39523 c/s virtual

Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X3]... (96xOMP) DONE
Speed for cost 1 (iteration count) of 32
Raw:    86832 c/s real, 902 c/s virtual

Benchmarking: sha512crypt, crypt(3) $6$ (rounds=5000) [SHA512 256/256 AVX2 4x]... (96xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw:    64060 c/s real, 669 c/s virtual

48 threads works slightly better for descrypt:

$ OMP_NUM_THREADS=48 john -test -form=descrypt
Will run 48 OpenMP threads
Benchmarking: descrypt, traditional crypt(3) [DES 256/256 AVX2]... (48xOMP) DONE
Many salts:     418480K c/s real, 8718K c/s virtual
Only one salt:  79034K c/s real, 1651K c/s virtual

Not bad for one CPU chip.  Just a few years ago these speeds at descrypt
and md5crypt and sha512crypt were only achieved on GPU.  Of course,
modern high-end GPUs are a few times faster at these three hash types...
but not at bcrypt.

That speed at bcrypt is the highest I see so far for any one chip - we
reach higher speeds on ZTEX boards, but those have four FPGA chips each,
and NVIDIA Tesla V100 GPU doesn't reach the above speed (but gets very
close).  I guess an AMD EPYC with 128 threads (64 cores) will show even
better speed; I just haven't had access to one yet.

Of course, this isn't as energy-efficient as the FPGAs are, but it is a
higher speed per chip.  We'll need to support larger FPGAs to beat that.

> c5a.24xlarge is currently priced at $1.56+/hour spot, $3.696 on-demand.
> Our Bundle (beyond the free trial) costs $0.64/hour on this instance.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.