musl - Re: roundf() (and round(), and ...)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240619232020.GA10433@brightrain.aerifal.cx>
Date: Wed, 19 Jun 2024 19:20:20 -0400
From: Rich Felker <dalias@...c.org>
To: Damian McGuckin <damianm@....com.au>, MUSL <musl@...ts.openwall.com>
Subject: Re: roundf() (and round(), and ...)

On Wed, Jun 19, 2024 at 08:58:37PM +0200, Szabolcs Nagy wrote:
> * Damian McGuckin <damianm@....com.au> [2024-06-19 18:17:21 +1000]:
> > On Tue, 18 Jun 2024, Szabolcs Nagy wrote:
> > > FLT_EVAL_METHOD is not relevant for this code.
> > 
> > That was my placeholder as in 'Memo to Self'.
> > 
> > I have not used anything other than 'FLT_EVAL_METHOD=0' for decades so
> > I am likely to not handle the other scenarios properly, i.e. using the
> > types 'float_t' and 'double_t' as appropriate. So I am worried that my
> > code is buggy in that context.
> 
> you can change all 'float x' declarations to 'float_t x' and
> sometimes this allows saving a store+load on i386 and m68k
> due to iso c rules.
> 
> e.g. consider
> 
> float foo1(float x)
> {
> 	T y = x*x;
> 	T z = x+y;
> 	return z;
> }
> 
> with T==float and FLT_EVAL_METHOD!=0 this is not the same as
> 
> float foo2(float x)
> {
> 	retrun x + x*x;
> }
> 
> because iso c requires assignment (and return statement) to
> round away the excess precision, so in the former there are two
> float stores while in the latter only one (for the return).
> (with vs without the extra store the result is slightly different
> but they are both different from ieee754 arithmetics anyway.
> if the difference matters we use explicit eval_as_float.
> i believe the standard -ffp-contract=on mode only contracts
> foo2 and would give yet another different result, but musl keeps
> contraction off so there is no implicit fma.)
> 
> there are two approaches to deal with the extra store issue
> (only a performance issue, not correctness):
> 
> the gnu approach is that by default assignment does not round
> excess precision (-fexcess-precision=fast) so T==float works,
> but there is no guarantee where exactly the rounding happens,
> any register spill may do it.
> 
> the musl approach is to use -fexcess-precision=standard and
> use T==float_t and then it is guaranteed that intermediate
> results are evaluated with excess precision (even if spilled).
> 
> i don't think optimizing for i386 or m68k is very relevant today,
> so if you get this wrong it will not be a big issue.

The code needs to be correct even on archs with excess precision. But
there's usually only a correctess distinction for functions like this
with bit-exact/correctly-rounded results, which we don't require for
most of libm.

> > > benchmark data may be useful (or code size e.g. on a soft-float target)
> > > because that may be a valid justification to use this implementation.
> > 
> > It is much less assembler size on a hard-float target.
> > 
> > I have no experience on a soft-float target.
> > 
> > In my testing on X86-64, it is faster than either existing MUSL code or
> > GLIBM, but not as fast as assembler using 'ROUNDSS/ROUNDSD'.  Not that these
> > routines are going to be performance killers.
> 
> ok, this sounds good.
> 
> note that in musl fabsf() is an actual call because we build musl
> with -ffrestanding (builtins are turned off)
> 
> at some point we should fix this to ensure important functions are
> optimized internally, e.g. have a header with
> 
> #define fabsf(x) __builtin_fabsf(x)

Yes. I think the src/include/*.h functions should do this where
appropriate, but we probably need detection unless __GNUC__ is
sufficient to assume the existence of all the ones we want to use.

We also need a list of functions it would be helpful for (mainly
memcpy, memset, strlen & friends, and some math functions, I think)
and some review to make sure there are no places where using the
builtins would be breaking.

> > > nowadays i'd probably write
> > > 
> > >  if (asuint(a) >> 23 < asuint(0x1p23f) >> 23)
> > 
> > I wish it was asuint32 rather than asuint() but maybe I am too picky.
> > 
> > I would prefer to define the bias 'b' and precision 'p' and write it as
> > 
> > 	if ((asuint(a) >> (p - 1)) < b + (p - 1))
> > 
> > But I was worried that would be a change too far. Also, I use different
> > languages in a given week so I probably us more coercions (or casts) and
> > parentheses than MUSL prefers for C.
> 
> i'm used to writing out the bias term and precision as int literal
> but using names is fine too i guess.

I prefer having it written out where it's used rather than having to
go look somewhere else to see the value.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.