|
Message-ID: <20150806043252.GB1900@localhost> Date: Wed, 5 Aug 2015 21:32:53 -0700 From: Isaac Dunham <ibid.ag@...il.com> To: musl@...ts.openwall.com Cc: Rich Felker <dalias@...c.org> Subject: Re: Re: Further dynamic linker optimizations On Wed, Aug 05, 2015 at 03:37:25PM -0700, Andy Lutomirski wrote: > On 07/07/2015 10:48 PM, Timo Teras wrote: > >On Tue, 7 Jul 2015 14:55:05 -0400 > >Rich Felker <dalias@...c.org> wrote: > > > >>On Tue, Jul 07, 2015 at 09:39:09PM +0300, Alexander Monakov wrote: > >>>On Tue, 30 Jun 2015, Rich Felker wrote: > >>> > >>>>Discussion on #musl with Timo Ter??s has produced the following > >>>>results: > >>>> > >>>>- Moving bloom filter size to struct dso gives 5% improvement in > >>>>clang (built as 110 .so's) start time, simply because of a > >>>>reduction of number of instructions in the hot path. So I think > >>>>we should apply that patch. > >>> > >>>I think most of the improvement here actually comes from fewer > >>>cache misses. As a result, I think we should take this idea further > >>>and shuffle struct dso a little bit so that fields accessed in the > >>>hot find_sym loop are packed together, if possible. > >> > >>I'm not entirely convinced; the 5% seems consistent with the number of > >>instructions in the code path. Can you confirm this with cache miss > >>measurements? Or just by obtaining better timings reordering data for > >>cache locality? Note that the head of struct dso has to remain fixed > >>(it's gdb ABI :/) but the rest is free to change. > > > >I used cachegrind and callgrind to benchmark. In my case there was no > >change in cache miss number - the speed up was purely based on running > >less instructions on the hot path. > > > >Though, I ran this on i7 with lot of cache. Cache misses could become > >issue on smaller cpus. But I suspect the bloom filter is doing good > >enough job to keep cache usage on sensible levels. > > > >>>>- The whole outer for loop in find_sym is the hot path for > >>>> performance. As such, eliminating the lazy calculation of > >>>>gnu_hash and simply doing it before the loop should be a > >>>>measurable win, just by removing the if (!ghm) branch. > >>> > >>>On a related note, it's possible to avoid calculating sysv hash, if > >>>gnu-hash is enabled system-wide, by not setting 'global' flag on > >>>the vdso item (as mentioned on IRC in your conversation with Timo). > >> > >>Yes, and I think this sounds like a worthwhile approach. Seeing > >>timings for it would be great. :-) > > > >I told them earlier in IRC. But on the same i7 box and running "clang > >--version" which has 100+ DT_NEEDED... removing vdso and thus sysv > >hashing had magnitude of tens of milliseconds. (I wonder how it'd > >perform if we calculated both sysv and gnu hashes at same time.) > > /me dons vdso maintainer hat. > > I can add a GNU hash to the vdso quite easily (for Linux 4.3). Would that > be helpful? Would this require a binutils version that supports GNU hashes? And if so, would it be a hard build-time requirement? Thanks, Isaac Dunham
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.