|
Message-Id: <6EE427BD-0BA0-47D0-9F8B-D3C01E814544@gmail.com> Date: Tue, 14 Jul 2015 12:11:16 +0800 From: Lei Zhang <zhanglei.april@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Interleaving of intrinsics > On Jun 23, 2015, at 2:03 AM, Solar Designer <solar@...nwall.com> wrote: > > One thing that is clear is that non-fully-unrolled *_PARA_DO are not > acceptable. If there are not enough registers for fully unrolling > these without incurring spilling, then the interleaving factor should be > smaller. On MIC, there should be enough registers for the interleaving > factors considered above (up to 5x). I just manually unrolled SHA256_STEP and SHA512_STEP respectively, and compared the performance with the auto-unrolled ones, using magnum's testpara.pl. The figures below are obtained on my laptop (formats are pbkdf2-*): [auto] hash\para | 1 | 2 | 3 | 4 | 5 | -----------|----------|----------|----------|----------|----------| sha256 | **4020**| 3760 | 3924 | 3801 | 3940 | sha512 | **1624**| 1092 | 1413 | 1409 | 1435 | [manual] hash\para | 1 | 2 | 3 | 4 | 5 | -----------|----------|----------|----------|----------|----------| sha256 | **4144**| 1888 | 1817 | 1837 | 1821 | sha512 | **1646**| 748 | 708 | 720 | 722 | With manual unrolling, the performance degrades drastically from interleaving x1 to x2, but not so much upwards. BTW, I didn't change the original array tmps. Just the loop is manually unrolled here. Lei
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.