|
Message-ID: <20240827213210.GF10433@brightrain.aerifal.cx>
Date: Tue, 27 Aug 2024 17:32:10 -0400
From: Rich Felker <dalias@...c.org>
To: Pedro Falcato <pedro.falcato@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: Proposed printf stack usage improvement
On Tue, Aug 27, 2024 at 04:42:35PM +0100, Pedro Falcato wrote:
> On Tue, Aug 27, 2024 at 11:21:33AM GMT, Rich Felker wrote:
> > On Tue, Aug 27, 2024 at 10:23:57AM +0100, Pedro Falcato wrote:
> > > LGTM.
> > >
> > > But maybe you should also include my __attribute__((noinline))
> > > sugestion, to make sure the integer printf and floating point paths
> > > get mixed by the compiler. Even if current gcc/clang don't seem to
> > > want to do that, it's better to be safe than sorry (and I assume any
> > > LTO/PGO might change that atm).
> >
> > I'm not clear what ill effect you're trying to mitigate here.
>
> (fwiw, if it wasn't clear: I meant "make sure the <...> *don't* get mixed)
>
> fmt_fp with the patch applied still has a significant stack impact (520 bytes according to my
> measurement) which can be avoided on the vast majority of (integer) printfs.
How did you measure? There should be essentially no static stack usage
in fmt_fp with this patch, only dynamic (VLA). On archs with
ld==double, it's possible that the compiler could decide to "optimize"
a VLA whose size can only have one possible value to a non-VLA, then
lift if, but this would be a highly malicious transformation that
could lead to much more catastrophic stack overflows in real-world
usage I think, so I would not expect compilers to do it.
Indeed a quick check of the attached, which I wrote to be as naively
easy to mis-optimize as possible, shows neither gcc nor clang lifting
the VLA.
> printf_core OTOH uses up 472 bytes of stack, so the simple possibility of inlining it can
> (worst case) more than double the stack space used by all printfs.
>
> Granted, the patch seems to convince clang not to inline fmt_fp at all, but AFAIK this is by no means
> a guarantee.
GCC inlines it fine, which is a good thing. This is a function which
is called only one place, and just outlined in the source for the sake
of readability, having its own locals, etc. There's no good reason to
*want* the call boundary overhead.
At some point it might make sense to move fmt_fp to its own TU if we
want to have a way to suppress it from getting linked at all, and this
would also force non-inlining. But it doesn't seem to be desirable to
suppress inlining for its own sake.
> One could consider this somewhat of a microoptimization, but musl thread stacks are by no
> means big, so...
I think generally we don't care about 500 bytes anyway -- I'm not
going to deem a function that overflows the last 500 bytes of a stack
that's too small a bug. Even printf using 8k wasn't a "bug"; the main
motivation for changing this is not to let people YOLO calling printf
with a stack that's barely big enough, but to avoid dirtying extra
pages for no good reason. The 8k pretty much unconditionally dirtied 2
extra otherwise-unused pages for any program using printf.
Rich
View attachment "vla_lift.c" of type "text/plain" (95 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.