musl - Re: Re:Re: Re:Re: Re:Re: qsort

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230210141955.GA4163@brightrain.aerifal.cx>
Date: Fri, 10 Feb 2023 09:19:55 -0500
From: Rich Felker <dalias@...c.org>
To: David Wang <00107082@....com>
Cc: musl@...ts.openwall.com, Markus Wichmann <nullplan@....net>
Subject: Re: Re:Re: Re:Re: Re:Re: qsort

On Fri, Feb 10, 2023 at 09:45:12PM +0800, David Wang wrote:
> 
> 
> 
> At 2023-02-10 21:10:45, "Rich Felker" <dalias@...c.org> wrote:
> >On Fri, Feb 10, 2023 at 06:00:27PM +0800, David Wang wrote:
> 
> >What tool was used for this? gprof or anything else invasive is not
> >meaningful; for tiny functions, the entire time measured will be the
> >profiling overhead. perf(1) is the only way I know to get meaningful
> >numbers.
> >
> >In particular, it makes no sense that significant time was spent in
> >wrapper_cmp, which looks like (i386):
> >
> >   0:   ff 64 24 0c             jmp    *0xc(%esp)
> >
> >or (x86_64):
> >
> >   0:   ff e2                   jmpq   *%rdx
> >
> >or (arm):
> >
> >   0:   4710            bx      r2
> >
> >but I can imagine it being (relatively) gigantic with a call out to
> >profiling code.
> >
> >Rich
> 
> I have myself implemented a profiling tool, using perf_event_open to
> start profiling and mmap to collect callchains, the source code is
> here
> https://github.com/zq-david-wang/linux-tools/blob/main/perf/profiler/profiler.cpp
> (Still buggy, there is always strange callchain which I could not
> figure out...and I am still working on it...)

Thanks for sharing. It's nice to see another tool like this.

> Also, I did not use any optimization when compile the code, which
> could make a difference, I will take time to give it a try .

Yes, that would make a big difference. For this to be meaninful the
measurement needs to be with optimizations.

> About wrapper_cmp, in my last profiling, there are total 931387
> samples collected, 257403 samples contain callchain ->wrapper_cmp,
> among those 257403 samples, 167410 samples contain callchain
> ->wrapper_cmp->mycmp, that is why I think there is extra overhead
> about wrapper_cmp. Maybe compiler optimization would change the
> result, and I will make further checks.

Yes. On i386 here, -O0 takes wrapper_cmp from 1 instruction to 10
instructions.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.