|
Message-ID: <tencent_B09B673E38D078993C900DB8B8CDE020E107@qq.com> Date: Tue, 16 May 2023 17:28:53 +0800 From: "847567161" <847567161@...com> To: "Rich Felker" <dalias@...c.org>, "musl" <musl@...ts.openwall.com> Subject: Re:Re: Re: Re: [Patch 1/1] vfprintf: optimize vfprintf performance > Interesting -- so you're lazily constructing the nl_arg data when you > first discover the need for it, rather than unconditionally doing the > first pass to look for it. Do you measure a relevant performance > difference doing this? Yes, the benchmark show there are about 30% benefits for commonly used format string. BM_stdio_printf_s, 160 ns(before opt), 110 ns (after opt) static void BM_stdio_printf_s(benchmark::State& state) { while (state.KeepRunning()) { char buf[BUFSIZ]; snprintf(buf, sizeof(buf), "this is a more typical error message with detail: %s", "No such file or directory"); } } MUSL_BENCHMARK(BM_stdio_printf_s); > Stylistically, I like that the patch is minimally invasive and does > not make complex changes (like what I suggested above) together with > the intended functional change. I don't really like building up extra > "special meaning" state like whether nl_arg_ptr is null on top of > whether f is null to represent what mode the printf_core is operating > in, but this can/should probably be cleaned up as a separate patch to > keep the functional change itself simple, like what you've done. I add nl_arg_filled to indicate whether we have completed the acquisition of parameters. > On a technical note, some compilers are bad about hoisting large local > objects out of their scopes when inlining. This results in the large > floating point buffer from fmt_fp getting allocated at the top of > printf_core, and presumably causes a recursive call to printf_core to > double the stack requirements here, which is problematic. I'm not sure > if we should try to avoid recursion (it should be fairly > straightforward to do so) or just try to fix the unwanted hoisting to > begin with (something I've been wanting to do for a while). I modified buf to be declared when using ranther than at the top of printf_core. > It looks like these two hunks are almost not needed, since it's not > allowed to mix positional and non-positional forms. Thanks, I had changed it in patch. So musl don't support these format strings, right? printf("yc test: %*2$d \n", 2, 10); printf("yc test: %.*2$f \n", 1.123456, 10); Does "%n$d", "%*n$d" or "%.*n$d" should be used together? > However, %m does > not take an argument, but might have width/precision in * form. It'd > be nice if there were some way to avoid this duplication of logic. Do you mean "%*m$" or "%.*m$"? What do you mean "duplication of logic"? I don't get it. > No check has been made that you haven't already consumed arguments via > non-positional-form format specifiers. This will make things blow up > much worse if the caller wrongly mixes them, consuming wrong-type args > from the wrong positions rather than erroring out (as we do now) or > trapping (probably ideal). Right now we don't actually catch all forms > of mixing either, but we do necessarily consume all positional args in > order before attempting to consume any erroneously-specified > non-positional ones. Your patch changes this to happen in the order > things were encountered. So I think, as a first step before even doing > this patch, we should update the l10n flag to a tri-state thing where > it can be yes/no/indeterminate, to allow erroring out on any specifier > that mismatches the existing yes/no. I remember you mentioned "calling printf with an invalid format string has undefined behavior" here, so maybe it's also ok. https://www.openwall.com/lists/musl/2023/05/07/1 Chuang Yin ------------------ 原始邮件 ------------------ 发件人: "Rich Felker" <dalias@...c.org>; 发送时间: 2023年5月16日(星期二) 凌晨0:22 收件人: "847567161"<847567161@...com>; 抄送: "musl"<musl@...ts.openwall.com>; 主题: Re: [musl] Re: Re: Re: [Patch 1/1] vfprintf: optimize vfprintf performance On Mon, May 15, 2023 at 06:04:48PM +0800, 847567161 wrote: > Hi, > > I'm sorry for the incorrect format diff. I updated it. It's still misformatted, with a mix of HTML entities in the body. Can you please send it as an attachment instead of inline in the email if your mail software can't handle plain text non-destructively? > As we discussed earlier, I made a patch to get nl_arg when we see > '$' ranther than visit the format anyway. Interesting -- so you're lazily constructing the nl_arg data when you first discover the need for it, rather than unconditionally doing the first pass to look for it. Do you measure a relevant performance difference doing this? > Could you please help me review it? A couple things I noticed: > --------------------------------- > diff --git a/src/stdio/vfprintf.c b/src/stdio/vfprintf.c > old mode 100644 > new mode 100755 > index 9b961e7f..3294e23b > --- a/src/stdio/vfprintf.c > +++ b/src/stdio/vfprintf.c > @@ -427,7 +427,7 @@ static int getint(char **s) { > return i; > } > > -static int printf_core(FILE *f, const char *fmt, va_list *ap, union arg *nl_arg, int *nl_type) > +static int printf_core(FILE *f, const char *fmt, va_list *ap, union arg *nl_arg, int *nl_type, union arg* nl_arg_ptr) > { > char *a, *z, *s=(char *)fmt; > unsigned l10n=0, fl; > @@ -462,6 +462,12 @@ static int printf_core(FILE *f, const char *fmt, va_list *ap, union arg *nl_arg, > if (l) continue; > > if (isdigit(s[1]) &amp;&amp; s[2]=='$') { > + if (nl_arg_ptr == NULL) { > + nl_arg_ptr = nl_arg; > + if (printf_core(0, fmt, ap, nl_arg, nl_type, nl_arg_ptr) &lt; 0) { > + return -1; > + } > + } > l10n=1; > argpos = s[1]-'0'; > s+=3; No check has been made that you haven't already consumed arguments via non-positional-form format specifiers. This will make things blow up much worse if the caller wrongly mixes them, consuming wrong-type args from the wrong positions rather than erroring out (as we do now) or trapping (probably ideal). Right now we don't actually catch all forms of mixing either, but we do necessarily consume all positional args in order before attempting to consume any erroneously-specified non-positional ones. Your patch changes this to happen in the order things were encountered. So I think, as a first step before even doing this patch, we should update the l10n flag to a tri-state thing where it can be yes/no/indeterminate, to allow erroring out on any specifier that mismatches the existing yes/no. Stylistically, I like that the patch is minimally invasive and does not make complex changes (like what I suggested above) together with the intended functional change. I don't really like building up extra "special meaning" state like whether nl_arg_ptr is null on top of whether f is null to represent what mode the printf_core is operating in, but this can/should probably be cleaned up as a separate patch to keep the functional change itself simple, like what you've done. On a technical note, some compilers are bad about hoisting large local objects out of their scopes when inlining. This results in the large floating point buffer from fmt_fp getting allocated at the top of printf_core, and presumably causes a recursive call to printf_core to double the stack requirements here, which is problematic. I'm not sure if we should try to avoid recursion (it should be fairly straightforward to do so) or just try to fix the unwanted hoisting to begin with (something I've been wanting to do for a while). > @@ -477,6 +483,12 @@ static int printf_core(FILE *f, const char *fmt, va_list *ap, union arg *nl_arg, > /* Read field width */ > if (*s=='*') { > if (isdigit(s[1]) &amp;&amp; s[2]=='$') { > + if (nl_arg_ptr == NULL) { > + nl_arg_ptr = nl_arg; > + if (printf_core(0, fmt, ap, nl_arg, nl_type, nl_arg_ptr) &lt; 0) { > + return -1; > + } > + } > l10n=1; > nl_type[s[1]-'0'] = INT; > w = nl_arg[s[1]-'0'].i; > @@ -491,6 +503,12 @@ static int printf_core(FILE *f, const char *fmt, va_list *ap, union arg *nl_arg, > /* Read precision */ > if (*s=='.' &amp;&amp; s[1]=='*') { > if (isdigit(s[2]) &amp;&amp; s[3]=='$') { > + if (nl_arg_ptr == NULL) { > + nl_arg_ptr = nl_arg; > + if (printf_core(0, fmt, ap, nl_arg, nl_type, nl_arg_ptr) &lt; 0) { > + return -1; > + } > + } > nl_type[s[2]-'0'] = INT; > p = nl_arg[s[2]-'0'].i; > s+=4; It looks like these two hunks are almost not needed, since it's not allowed to mix positional and non-positional forms. However, %m does not take an argument, but might have width/precision in * form. It'd be nice if there were some way to avoid this duplication of logic. > @@ -659,16 +677,13 @@ int vfprintf(FILE *restrict f, const char *restrict fmt, va_list ap) > va_list ap2; > int nl_type[NL_ARGMAX+1] = {0}; > union arg nl_arg[NL_ARGMAX+1]; > + union arg* nl_arg_ptr = NULL; Useless variable. You could just... > unsigned char internal_buf[80], *saved_buf = 0; > int olderr; > int ret; > > /* the copy allows passing va_list* even if va_list is an array */ > va_copy(ap2, ap); > - if (printf_core(0, fmt, &amp;ap2, nl_arg, nl_type) &lt; 0) { > - va_end(ap2); > - return -1; > - } > > FLOCK(f); > olderr = f-&gt;flags &amp; F_ERR; > @@ -680,7 +695,7 @@ int vfprintf(FILE *restrict f, const char *restrict fmt, va_list ap) > f-&gt;wpos = f-&gt;wbase = f-&gt;wend = 0; > } > if (!f-&gt;wend &amp;&amp; __towrite(f)) ret = -1; > - else ret = printf_core(f, fmt, &amp;ap2, nl_arg, nl_type); > + else ret = printf_core(f, fmt, &amp;ap2, nl_arg, nl_type, nl_arg_ptr); ...pass a null pointer constant (0) here. Overall, I think I'm very open to making this change, especially if it helps performance in some measurable way. With some more changes on top of it, I think it could simplify and clean up the printf logic too. Rich Content of type "text/html" skipped Download attachment "optimize_printf_core.diff" of type "application/octet-stream" (1970 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.