|
Message-ID: <20140826034321.GA13999@brightrain.aerifal.cx> Date: Mon, 25 Aug 2014 23:43:21 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Multi-threaded performance progress This release cycle looks like it's going to be huge for multi-threaded performance issues. So far the cumulative improvement on my main development system, as measured by the cond_bench.c by Timo Teräs, is from ~250k signals in 2 seconds up to ~3.7M signals in 2 seconds. That's comparable to what glibc gets on similar hardware with a cond var implementation that's much less correct. The improvements are a result of adding private futex support, redesigning the cond var implementation, and improvements to the spin-before-futex-wait behavior. Semaphore performance has also improved, up from fewer than 500k wait/post operations to ~12M, mostly due to spin-before-futex-wait. The above results are all based on micro-benchmarks which are potentially meaningless to real-world applications, so I'd be interested in seeing any higher-level or real-application-based comparisons of the old and new code. There is one remaining performance issue I still want to look into fixing, possibly during this release cycle: when a thread repeatedly takes and releases a lock on which other threads are waiting, it makes a futex wake syscall on each unlock, despite only the first one being necessary. I have a design for avoiding this on internal locks, but it's less obvious how to do it for mutexes where storage is tight and self-synchronized destruction is possible. We're near the end of my planned time frame for this release cycle, but I'm still interested in working with Jens to get C11 threads into this release if possible, so I'll probably extend it for a while still. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.