|
Message-ID: <CAK1hOcM+0OHph8Misqrgt_QspoPox8ybqrEmsCDUiEn76CCORQ@mail.gmail.com> Date: Wed, 11 Feb 2015 02:07:23 +0100 From: Denys Vlasenko <vda.linux@...glemail.com> To: musl <musl@...ts.openwall.com> Subject: Re: [PATCH 1/2] x86_64/memset: simple optimizations On Tue, Feb 10, 2015 at 10:37 PM, Rich Felker <dalias@...c.org> wrote: > OK. Based on some casual testing on my Celeron 847: > > - For small sizes, your patches make significant improvement, 20-30%. > > - For rep stosq path, the improvement is minimal (roughly 1-2 cycles). > > - Using 32-bit imul instead of 64-bit makes no difference at all. That's because Celeron 847 is a Sandy Bridge CPU. Only Intel's "big" CPUs starting from Nehalem have fast (and large in transistor count) integer multiplier capable of 3-cycle 64-bit multiply. Many other CPUs are worse, even Intel ones: Atoms are 13-cycle (!), Silvermont: 5 cycles. AMD's Bulldozers: 6 cycles, Bobcat: 6-7, Jaguar: 6, K10: 4 cycles. 32-bit imul is 3 or 4 cycles on all these CPUs (well, Atom has 5).
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.