|
Message-ID: <20230211133532.GD4163@brightrain.aerifal.cx> Date: Sat, 11 Feb 2023 08:35:33 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: Re:Re: Re:Re: Re:Re: Re:Re: qsort On Sat, Feb 11, 2023 at 10:06:02AM +0100, alice wrote: > On Sat Feb 11, 2023 at 9:39 AM CET, Joakim Sindholt wrote: > > On Sat, 11 Feb 2023 06:44:29 +0100, "alice" <alice@...ya.dev> wrote: > > > based on the glibc profiling, glibc also has their natively-loaded-cpu-specific > > > optimisations, the _avx_ functions in your case. musl doesn't implement any > > > SIMD optimisations, so this is a bit apples-to-oranges unless musl implements > > > the same kind of native per-arch optimisation. > > > > > > you should rerun these with GLIBC_TUNABLES, from something in: > > > https://www.gnu.org/software/libc/manual/html_node/Hardware-Capability-Tunables.html > > > which should let you disable them all (if you just want to compare C to C code). > > > > > > ( unrelated, but has there been some historic discussion of implementing > > > something similar in musl? i feel like i might be forgetting something. ) > > > > There already are arch-specific asm implementations of functions like > > memcpy. > > apologies, i wasn't quite clear- the difference > between src/string/x86_64/memcpy.s and the glibc fiesta is that the latter > utilises subarch-specific SIMD (as you explain below), e.g. AVX like in the > above benchmarks. a baseline x86_64 asm is more fair-game if the difference is > as significant as it is for memcpy :) Folks are missing the point here. It's not anything to do with AVX or even glibc's memcpy making glibc faster here. Rather, it's that glibc is *not calling memcpy* for 4-byte (and likely a bunch of other specialized cases) element sizes. Either they manually special-case them, or the compiler (due to lack of -ffreestanding and likely -O3 or something) is inlining the memcpy. Based on the profiling data, I would predict an instant 2x speed boost special-casing small sizes to swap directly with no memcpy call. Incidentally, our memcpy is almost surely at least as fast as glibc's for 4-byte copies. It's very large sizes where performance is likely to diverge. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.