musl - Re: Why does musl printf() use so much more stack than other implementations when printf()ing floating point numbers?

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200203215713.GS1663@brightrain.aerifal.cx>
Date: Mon, 3 Feb 2020 16:57:13 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Cc: Simon <simonhf@...il.com>
Subject: Re: Why does musl printf() use so much more stack than other
 implementations when printf()ing floating point numbers?

On Mon, Feb 03, 2020 at 01:14:21PM -0800, Simon wrote:
> I recently noticed that musl printf() implementation uses surprisingly more
> stack space than other implementations, but only if printing floating point
> numbers, and made some notes here [1]. Any ideas why this happens, and any
> chance of fixing it?
> 
> [1] https://gist.github.com/simonhf/2a7b7eb98d2a10c549e8cc858bbefd53

It's fundamental; ability to exactly print arbitrary floating point
numbers takes considerable working space unless you want to spend
O(n³) time or so (n=exponent value) to keep recomputing things. The
minimum needed is probably only around 2/3 of what we use, so it would
be possible to reduce slightly, but I doubt a savings of <3k is worth
the complexity of ensuring it would still be safe and correct.

Note that on archs without extended long double type, which covers
everything used in extreme low-memory embedded environments, the
memory usage is far lower. This is because it's proportional to the
max possible exponent value, which is 1k instead of 16k if nothing
larger than IEEE double is supported.

I don't know exactly what glibc does, but it's likely they're just
using malloc, which is going to be incorrect because it can fail
dynamically with OOM.

In principle we could also make the working array a VLA and compute
smaller bounds on the size needed when precision is limited (the
common case). This might really be a practical "fix" for cases people
care about, and it would also solve the problem where LLVM makes
printf *always* use ~9k stack because it hoists the lifetime of the
floating point working array all the way to the top when inlining
(this is arguably a serious optimization bug since it can transform
all sorts of code that's possible to execute into code that's
impossible to execute due to huge stack requirements). By having it be
a VLA whose size isn't determined except in the floating point path,
LLVM wouldn't be able to hoist it like that.

Making this change would still be significant work though, mainly in
verification that the bounds are correct and that there are no cases
where the smaller array can be made to overflow.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.