Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260522090718.GH3520958@port70.net>
Date: Fri, 22 May 2026 11:07:18 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: Justine Tunney <jtunney@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH] Make qsort 50% faster

* Justine Tunney <jtunney@...il.com> [2026-05-21 19:29:36 -0700]:
> This change makes musl qsort go 50% faster on common sizes by avoiding
> memcpy calls. On x86-64 with cc -Os it can be as much as 2x faster. No
> changes have been made to the smoothsort algorithm itself. This simple
> patch brings musl much closer to other C libraries in terms of sorting
> perf, while maintaining smoothsort's highly desirable properties, e.g.
> small size, stack safety, signal safety, no malloc dependency, etc.
> 
> The cost is +670 bytes of .text (on x86-64) to specialize cycle() for
> widths 4, 8, 12, 16, 20, 24, 28, and 32 (gcc, without stack protector).
> But if you want to be like OpenBSD's qsort and only specialize 4 and 8,
> then you're looking at a +140 byte increase. I chose the upper bound of
> what's practical; after 32, gcc stops inlining memcpy cleanly.
> 
> Further analysis and benchmark numbers are available in my repo:
> https://github.com/jart/musl-qsort-speedup

i think upstream musl would need
__builtin_memcpy for this to work.

> +/* specializes cycle() for constant width */
> +#define DECLARE_CYCLE(NAME, WIDTH)					\
> +	static void NAME(size_t width, unsigned char* ar[], long n)	\
> +	{								\
> +		long i;							\
> +		char t[WIDTH];						\
> +		if(n < 2)						\
> +			return;						\
> +		memcpy(t, ar[0], WIDTH);				\
> +		for (i = 0; i < n - 1; i++)				\
> +			memcpy(ar[i], ar[i + 1], WIDTH);		\
> +		memcpy(ar[n - 1], t, WIDTH);				\
> +	}
> +
> +DECLARE_CYCLE(cycle4, 4)
> +DECLARE_CYCLE(cycle8, 8)
> +DECLARE_CYCLE(cycle12, 12)
> +DECLARE_CYCLE(cycle16, 16)
> +DECLARE_CYCLE(cycle20, 20)
> +DECLARE_CYCLE(cycle24, 24)
> +DECLARE_CYCLE(cycle28, 28)
> +DECLARE_CYCLE(cycle32, 32)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.