Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240619232020.GA10433@brightrain.aerifal.cx>
Date: Wed, 19 Jun 2024 19:20:20 -0400
From: Rich Felker <dalias@...c.org>
To: Damian McGuckin <damianm@....com.au>, MUSL <musl@...ts.openwall.com>
Subject: Re: roundf() (and round(), and ...)

On Wed, Jun 19, 2024 at 08:58:37PM +0200, Szabolcs Nagy wrote:
> * Damian McGuckin <damianm@....com.au> [2024-06-19 18:17:21 +1000]:
> > On Tue, 18 Jun 2024, Szabolcs Nagy wrote:
> > > FLT_EVAL_METHOD is not relevant for this code.
> > 
> > That was my placeholder as in 'Memo to Self'.
> > 
> > I have not used anything other than 'FLT_EVAL_METHOD=0' for decades so
> > I am likely to not handle the other scenarios properly, i.e. using the
> > types 'float_t' and 'double_t' as appropriate. So I am worried that my
> > code is buggy in that context.
> 
> you can change all 'float x' declarations to 'float_t x' and
> sometimes this allows saving a store+load on i386 and m68k
> due to iso c rules.
> 
> e.g. consider
> 
> float foo1(float x)
> {
> 	T y = x*x;
> 	T z = x+y;
> 	return z;
> }
> 
> with T==float and FLT_EVAL_METHOD!=0 this is not the same as
> 
> float foo2(float x)
> {
> 	retrun x + x*x;
> }
> 
> because iso c requires assignment (and return statement) to
> round away the excess precision, so in the former there are two
> float stores while in the latter only one (for the return).
> (with vs without the extra store the result is slightly different
> but they are both different from ieee754 arithmetics anyway.
> if the difference matters we use explicit eval_as_float.
> i believe the standard -ffp-contract=on mode only contracts
> foo2 and would give yet another different result, but musl keeps
> contraction off so there is no implicit fma.)
> 
> there are two approaches to deal with the extra store issue
> (only a performance issue, not correctness):
> 
> the gnu approach is that by default assignment does not round
> excess precision (-fexcess-precision=fast) so T==float works,
> but there is no guarantee where exactly the rounding happens,
> any register spill may do it.
> 
> the musl approach is to use -fexcess-precision=standard and
> use T==float_t and then it is guaranteed that intermediate
> results are evaluated with excess precision (even if spilled).
> 
> i don't think optimizing for i386 or m68k is very relevant today,
> so if you get this wrong it will not be a big issue.

The code needs to be correct even on archs with excess precision. But
there's usually only a correctess distinction for functions like this
with bit-exact/correctly-rounded results, which we don't require for
most of libm.

> > > benchmark data may be useful (or code size e.g. on a soft-float target)
> > > because that may be a valid justification to use this implementation.
> > 
> > It is much less assembler size on a hard-float target.
> > 
> > I have no experience on a soft-float target.
> > 
> > In my testing on X86-64, it is faster than either existing MUSL code or
> > GLIBM, but not as fast as assembler using 'ROUNDSS/ROUNDSD'.  Not that these
> > routines are going to be performance killers.
> 
> ok, this sounds good.
> 
> note that in musl fabsf() is an actual call because we build musl
> with -ffrestanding (builtins are turned off)
> 
> at some point we should fix this to ensure important functions are
> optimized internally, e.g. have a header with
> 
> #define fabsf(x) __builtin_fabsf(x)

Yes. I think the src/include/*.h functions should do this where
appropriate, but we probably need detection unless __GNUC__ is
sufficient to assume the existence of all the ones we want to use.

We also need a list of functions it would be helpful for (mainly
memcpy, memset, strlen & friends, and some math functions, I think)
and some review to make sure there are no places where using the
builtins would be breaking.

> > > nowadays i'd probably write
> > > 
> > >  if (asuint(a) >> 23 < asuint(0x1p23f) >> 23)
> > 
> > I wish it was asuint32 rather than asuint() but maybe I am too picky.
> > 
> > I would prefer to define the bias 'b' and precision 'p' and write it as
> > 
> > 	if ((asuint(a) >> (p - 1)) < b + (p - 1))
> > 
> > But I was worried that would be a change too far. Also, I use different
> > languages in a given week so I probably us more coercions (or casts) and
> > parentheses than MUSL prefers for C.
> 
> i'm used to writing out the bias term and precision as int literal
> but using names is fine too i guess.

I prefer having it written out where it's used rather than having to
go look somewhere else to see the value.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.