Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <30eefd71.52d4.1835f8b367f.Coremail.00107082@163.com>
Date: Wed, 21 Sep 2022 18:15:02 +0800 (CST)
From: 王志强 <00107082@....com>
To: musl@...ts.openwall.com
Cc: "Quentin Rameau" <quinq@...th.space>, "Florian Weimer" <fweimer@...hat.com>, 
	dalias@...c.org
Subject: Re:Re: The heap memory performance (malloc/free/realloc) is
 significantly degraded in musl 1.2 (compared to 1.1)

Hi Rich,



I am quite interested into the topic,  and made a comparation between glibc and musl with following code:
#define MAXF 4096
void* tobefree[MAXF];
int main() {
    long long i;
    int v, k;
    size_t s, c=0;
    char *p;
    for (i=0; i<100000000L; i++) {
        v = rand();   
        s = ((v%256)+1)*1024;
        p = (char*) malloc(s);
        p[1023]=0;
        if (c>=MAXF) {
            k = v%c;
            free(tobefree[k]);
            tobefree[k]=tobefree[--c];
        }
        tobefree[c++]=p;
    }
    return 0;
}
```

The results show signaficant difference.
With glibc, (running within a debian docker image)
# gcc -o m.debian -O0 app_malloc.c

# time ./m.debian
real    0m37.529s
user    0m36.677s
sys    0m0.771s

With musl, (runnign within a alpine3.15 docker image)

# gcc -o m.alpine -O0 app_malloc.c

# time ./m.alpine
real    6m 30.51s
user    1m 36.67s
sys    4m 53.31s



musl seems spend way too much time within kernel, while glibc hold most work within userspace.
I used perf_event_open to profile those programs:
musl profiling(total  302899 samples) shows that those "malloc/free" sequence spend lots of time dealing with pagefault/munmap/madvise/mmap

munmap(30.858% 93469/302899)
_init?(22.583% 68404/302899)
aligned_alloc?(89.290% 61078/68404)
asm_exc_page_fault(45.961% 28072/61078)
main(9.001% 6157/68404)
asm_exc_page_fault(29.170% 1796/6157)
rand(1.266% 866/68404)
aligned_alloc?(20.437% 61904/302899)
asm_exc_page_fault(56.038% 34690/61904)
madvise(13.275% 40209/302899)
mmap64(11.125% 33698/302899)


But glibc profiling (total 29072 samples) is way much lighter, pagefault is the most cost while glibc spend significat time on "free"



pthread_attr_setschedparam?(82.021% 23845/29072)
asm_exc_page_fault(1.657% 395/23845)
_dl_catch_error?(16.714% 4859/29072)__libc_start_main(100.000% 4859/4859)
cfree(58.839% 2859/4859)
main(31.138% 1513/4859)
asm_exc_page_fault(2.115% 32/1513)
pthread_attr_setschedparam?(3.725% 181/4859)
random(2.099% 102/4859)
random_r(1.832% 89/4859)
__libc_malloc(1.420% 69/4859)
It seems to be me, glibc make lots of uasage of cache of kernel memory and avoid lots of pagefault and syscalls.
Is this performance difference should concern realworld applications?  On average, musl actual spend about 3~4ns per malloc/free, which is quite acceptable in realworld applications, I think.



(Seems to me, that the performance difference has nothing to do with malloc_usable_size,   which may be  indeed just a speculative guess without any base)






David Wang










Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.