|
Message-ID: <20150905022516.GA23258@openwall.com> Date: Sat, 5 Sep 2015 05:25:16 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: MD5 on XOP, NEON, AltiVec magnum, Lei - Here's what we had last year: Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 8x]... (8xOMP) DONE Raw: 201472 c/s real, 25152 c/s virtual Here's what we have now: Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE Raw: 150272 c/s real, 18784 c/s virtual I tried looking at "objdump -d sse-intrinsics.o" in the old build vs. "objdump -d simd-intrinsics.o" in the current version, and I don't see any obvious problem. Moreover, raw-md5 hasn't regressed, and I think both it and md5crypt share the SIMDmd5body() function. At this point, my best guess is we might be getting unaligned buffers. Once we figure this out and fix it, we'll need to revise MD5_I in simd-intrinsics.c to use my newly found expression with vcmov() on XOP, and the obvious expression with OR-NOT on NEON and AltiVec (IIRC, those archs have OR-NOT, which might be lower latency than select). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.