|
Message-ID: <20111024085337.GA18983@openwall.com> Date: Mon, 24 Oct 2011 12:53:37 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Benchmarks vs GCC version On Mon, Oct 24, 2011 at 12:28:23PM +0400, Solar Designer wrote: > [...] I tested with gcc 4.5.0 vs. 4.6.1 now. I confirm that > there's a 25% slowdown for the SSE2 intrinsics code when going from > 4.5.0 to 4.6.1. To partially cure it, add -fno-unit-at-a-time to > OPT_INLINE. Apparently, gcc 4.6.x just tries too hard to optimize those > functions with the S-box functions inlined into them, and it fails at > that. :-( With gcc 4.5.0, adding this option makes little difference. I made an error in my testing. What really made the difference for gcc 4.6.1 was disabling MAYBE_INLINE inside DES_bs_b.c. Unfortunately, it still has performance impact of roughly 10% compared to gcc 4.5.0's code. (Forced inlining was there for a reason.) > Of course, switching to hand-written assembly code is another valid > cure, but for OpenMP builds we currently/still use the intrinsics. So I > think I'll have to add -fno-unit-at-a-time to OPT_INLINE or to proposed > OMPFLAGS (for gcc) in the next release. Surprisingly, it turns out that with -fopenmp, gcc 4.6.1 produces good code as-is, with no changes needed. In fact, with gcc 4.6.1, I am getting better speed with -fopenmp and OMP_NUM_THREADS=1 (just for testing) than I do without -fopenmp. I think we need to find the right -f* option flipping which would cure performance for non-OpenMP builds, then report this regression to gcc developers. It could be some option implied by -fopenmp, or maybe that effect is from generated code changes in the OpenMP build. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.