|
Message-ID: <55C29025.8070507@kernel.org> Date: Wed, 5 Aug 2015 15:37:25 -0700 From: Andy Lutomirski <luto@...nel.org> To: musl@...ts.openwall.com, Rich Felker <dalias@...c.org> Subject: Re: Further dynamic linker optimizations On 07/07/2015 10:48 PM, Timo Teras wrote: > On Tue, 7 Jul 2015 14:55:05 -0400 > Rich Felker <dalias@...c.org> wrote: > >> On Tue, Jul 07, 2015 at 09:39:09PM +0300, Alexander Monakov wrote: >>> On Tue, 30 Jun 2015, Rich Felker wrote: >>> >>>> Discussion on #musl with Timo Teräs has produced the following >>>> results: >>>> >>>> - Moving bloom filter size to struct dso gives 5% improvement in >>>> clang (built as 110 .so's) start time, simply because of a >>>> reduction of number of instructions in the hot path. So I think >>>> we should apply that patch. >>> >>> I think most of the improvement here actually comes from fewer >>> cache misses. As a result, I think we should take this idea further >>> and shuffle struct dso a little bit so that fields accessed in the >>> hot find_sym loop are packed together, if possible. >> >> I'm not entirely convinced; the 5% seems consistent with the number of >> instructions in the code path. Can you confirm this with cache miss >> measurements? Or just by obtaining better timings reordering data for >> cache locality? Note that the head of struct dso has to remain fixed >> (it's gdb ABI :/) but the rest is free to change. > > I used cachegrind and callgrind to benchmark. In my case there was no > change in cache miss number - the speed up was purely based on running > less instructions on the hot path. > > Though, I ran this on i7 with lot of cache. Cache misses could become > issue on smaller cpus. But I suspect the bloom filter is doing good > enough job to keep cache usage on sensible levels. > >>>> - The whole outer for loop in find_sym is the hot path for >>>> performance. As such, eliminating the lazy calculation of >>>> gnu_hash and simply doing it before the loop should be a >>>> measurable win, just by removing the if (!ghm) branch. >>> >>> On a related note, it's possible to avoid calculating sysv hash, if >>> gnu-hash is enabled system-wide, by not setting 'global' flag on >>> the vdso item (as mentioned on IRC in your conversation with Timo). >> >> Yes, and I think this sounds like a worthwhile approach. Seeing >> timings for it would be great. :-) > > I told them earlier in IRC. But on the same i7 box and running "clang > --version" which has 100+ DT_NEEDED... removing vdso and thus sysv > hashing had magnitude of tens of milliseconds. (I wonder how it'd > perform if we calculated both sysv and gnu hashes at same time.) /me dons vdso maintainer hat. I can add a GNU hash to the vdso quite easily (for Linux 4.3). Would that be helpful? --Andy
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.