Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230419090210.GR3630668@port70.net>
Date: Wed, 19 Apr 2023 11:02:10 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: 张飞 <zhangfei@...iscas.ac.cn>
Cc: musl@...ts.openwall.com
Subject: Re: Re: memset_riscv64

* 张飞 <zhangfei@...iscas.ac.cn> [2023-04-19 13:33:08 +0800]:
> --------------------------------------------------------------------------------
> length(byte)  C language implementation(s)   Basic instruction implementation(s)
> --------------------------------------------------------------------------------	
> 4	          0.00000352	                    0.000004001	
> 8	          0.000004001	                    0.000005441	
> 16	          0.000006241	                    0.00000464	
> 32	          0.00000752	                    0.00000448	
> 64	          0.000008481	                    0.000005281	
> 128	          0.000009281	                    0.000005921	
> 256	          0.000011201	                    0.000007041	

i don't think these numbers can be trusted.

> #include <stdio.h>
> #include <sys/mman.h>
> #include <string.h>
> #include <stdlib.h>
> #include <time.h>
> 
> #define DATA_SIZE 5*1024*1024
> #define MAX_LEN 1*1024*1024
> #define OFFSET 0
> #define LOOP_TIMES 100
> int main(){
>    char *str1,*src1;
>    str1 = (char *)mmap(NULL, DATA_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> 
>    printf("function test start\n");
>    
>    src1 = str1+OFFSET;
>    struct timespec tv0,tv;
>    for(int len=2; len<=MAX_LEN; len*=2){
>       clock_gettime(CLOCK_REALTIME, &tv0);
>       for(int k=0; k<LOOP_TIMES; k++){
>           memset(src1, 'a', len);
>       }
>       clock_gettime(CLOCK_REALTIME, &tv);
>       tv.tv_sec -= tv0.tv_sec;
>       if ((tv.tv_nsec -= tv0.tv_nsec) < 0) {
> 	      tv.tv_nsec += 1000000000;
> 	      tv.tv_sec--;
>       }
>       printf("len: %d  time: %ld.%.9ld\n",len, (long)tv.tv_sec, (long)tv.tv_nsec);


this repeatedly calls memset with exact same len, alignment and value.
so it favours branch heavy code since those are correctly predicted.

but even if you care about a branch-predicted microbenchmark, you
made a single measurement per size so you cannot tell how much the
time varies, you should do several measurements and take the min
so noise from system effects and cpu internal state are reduced
(also that state needs to be warmed up). and likely the LOOP_TIMES
should be bigger too for small sizes for reliable timing.

benchmarking string functions is tricky especially for a target arch
with many implementations.

>    }
> 
>    printf("function test end\n");
>    munmap(str1,DATA_SIZE);
>    return 0;
> }
> 

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.