musl - Re: Considering x86-64 fenv.s to C

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200124045554.GR30412@brightrain.aerifal.cx>
Date: Thu, 23 Jan 2020 23:55:54 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Considering x86-64 fenv.s to C

On Fri, Jan 24, 2020 at 03:13:32PM +1100, Damian McGuckin wrote:
> >>/*
> >> * There is usually no need to mess the generic valid exceptions bits
> >> * given to 'feclearexcept' and 'feraiseexcept' so define a 'nothing'
> >> * macro for such a scenario - is this the MUSL accepted style????
> >> */
> >>#ifndef FE_QUALIFY_CLEAR_EXCEPT
> >>#define FE_QUALIFY_CLEAR_EXCEPT(excepts)    (0)
> >>#endif
> >>#ifndef FE_QUALIFY_RAISE_EXCEPT
> >>#define FE_QUALIFY_RAISE_EXCEPT(excepts)    (0)
> >>#endif
> >
> >Can you clarify how these would be defined for powerpc? Could the
> >logic just be embedded in get_sr/set_sr if needed?
> 
> Address the second question first, my original aim was to keep any
> logic out of where embedded assembler was used, mainly because of my
> own very
> irregular experience with it. I could also not convince myself that
>  get_sr and set_sr may need to know from where they are called. In
> the case of the powerpc64, with the SR and CR being the same, that
> adds to the complexity, so I gave up.

Note that it's not necessary to write that logic in asm just because
it's in the arch-defined part. It could simply be C that fixes up the
value before/after the asm. But indeed the following may be nicer
anyway:

> Answering your first question, for the powerpc64.
> 
> #define	QUALIFY_CLEAR_EXCEPT(e)	if (e & FE_INVALID) e |= FE_ALL_INVALID
> #define	QUALIFY_RAISE_EXCEPT(e)	if (e & FE_INVALID) e |= FE_INVALID_SOFTWARE
> 
> An alternative is to avoid these macros and then
> 
> 	#ifndef FE_ALL_INVALID
> 	#define	FE_ALL_INVALID 0
> 	#endif
> 	#ifndef FE_INVALID_SOFTWARE
> 	#define	FE_INVALID_SOFTWARE 0
> 	#endif
> 
> and have respectively the following code in feclear.. and feraise..
> 
> 	if (excepts & FE_INVALID)
> 		excepts |= FE_ALL_INVALID;
> and
> 	if (excepts & FE_INVALID)
> 		excepts |= FE_INVALID_SOFTWARE;
> 
> An optimize should see the OR with zero for most architectures and
> then ignore that whole 'if' statement because the action was a
> NO-OP.

Yes, this looks nice (although I would probably include FE_INVALID in
FE_ALL_INVALID just out of principle of least surprise; it should
generate the same code either way.

> >
> >>/*
> >> * Compute the rounding mask with some rigid logic
> >> */
> >>#ifndef FE_TO_NEAREST
> >>#define FE_TO_NEAREST   0
> >>#define ROUND_MASK  ((unsigned int) 0)
> >>#else
> >>#ifndef FE_TOWARDS_ZERO
> >>#define ROUND_MASK  ((unsigned int) 0)
> >>#else
> >>#ifdef  FE_DOWNWARD
> >>#ifdef  FE_UPWARD
> >>#define ROUND_MASK  ((unsigned int) (FE_DOWNWARD|FE_UPWARD|FE_TOWARDZERO))
> >>#else
> >>#define ROUND_MASK  UNSUPPORTED rounding
> >>#endif
> >>#else
> >>#ifdef  FE_UPWARD
> >>#define ROUND_MASK  UNSUPPORTED rounding
> >>#else
> >>#define ROUND_MASK  ((unsigned int) (FETOWARDZERO))
> >>#endif
> >>#endif
> >>#endif
> >>#endif
> >
> >I don't see why you have "UNSUPPORTED rounding" cases here. Any
> >combination should be valid.
> 
> My deliberate assumption was that the only legal combinations were
> 
> 	round to nearest only
> or
> 	round to nearest and truncate only
> or
> 	round to nearest, truncate, and round to either +/- INFINITY
> 
> Whether that assumption is valid is another thing but I am happy to
> be proven wrong.  If somebody defined an architecture where that
> assumption was not valid, I wanted the compiler to complain at me.

I don't want to be encoding assumptions that we're not actually using,
and that aren't useful, into the code. That makes the code more
complex and fragile. (If someone adds an arch where it's not true,
then this breaks and they have to figure out why before proceeding,
instead of just having it work.)

> >Since these are not bitmasks, probably the arch should just have
> >to define the rounding mode mask. For example if the values were
> >0b000, 0b001, 0b100, and 0b101, but the 0 in the middle was
> >significant, it should probably be treated as part of the mask.
> 
> That was not the way I read it but again, I am happy to be proven wrong.
> If the net OR'ing of the various types of rounding cannot be treated
> as a mask, then every *BSD libc implementations that I know will
> need to be fixed because they do treat it that way. I did run tests
> and none of the
> ones I tested from BSD and MUSL had that structure. Maybe my test code
> was wrong. They all seems to be 2 bits where TO_NEAREST was all bits
> zero and one of the others was the some of the other two.

OK then it's probably not an issue and generating it from | of the
individual ones should be fine.

> >If you really want to generate the mask, something like
> >
> >#ifdef FE_TONEAREST
> >#define FE_TONEAREST_MASK FE_TONEAREST
> >#else
> >#define FE_TONEAREST_MASK 0
> >#endif
> >...
> >#define ROUND_MASK (FE_TONEAREST_MASK | FE_DOWNWARD_MASK | ...)
> >
> >should be simpler and comprehensive.
> 
> I will run a test but I think they produce the same result for all
> the architecture for which I have definition files.

Yes, it produces the same result but it's fully general and doesn't
depend on special-casing all the combinations you expect to be
possible.

> >>int
> >>feclearexcept(excepts)
> >>{
> >>    FE_VALIDATE_EXCEPT(excepts);
> >>    FE_QUALIFY_CLEAR_EXCEPT(excepts);
> >>    setsr(getsr() & ~((unsigned int) excepts));
> >>    return(0);
> >>}
> >
> >No function-like parens around return value, no gratuitous casts.
> 
> No problems. Sorry, gratuitous casting is a habit to stop various
> linters complaining. I went looking for a MUSL style guide but could
> not find one.

There's something of a style guide on the wiki but it's not very
complete:

	https://wiki.musl-libc.org/coding-style.html

This discussion here suggests some things we could add.

Otherwise, just inferring style from existing code is what people
generally do.

> >>static inline int
> >>__raisearithmetically(int excepts)
> >>{
> >>    /*
> >>     * assume single OP is faster than double OP
> >>     */
> >>    const float one = (float) 1;
> >>    const float zero = (float) 0;
> >>    const float tiny = (float) 0x1.0p-126;
> >>    const float huge = (float) 0x1.0p+126;
> >>    volatile float x;
> >>
> >>    /*
> >>     * if it is just a simple exception, arithmetic expressions are optimal
> >>     */
> >>    switch(excepts)
> >>    {
> >>    case FE_INVALID:
> >>        x = zero, x /= x;
> >>        break;
> >>    case FE_DIVBYZERO:
> >>        x = zero, x = one / x;
> >>        break;
> >>    case FE_INEXACT:
> >>        x = tiny, x += one;
> >>        break;
> >>    case (FE_OVERFLOW | FE_INEXACT):

One more style thing: no need for parens here.

> >>        x = huge, x *= x;
> >>        break;
> >>    case (FE_UNDERFLOW | FE_INEXACT):
> >>        x = tiny, x *= x;
> >>        break;
> >>    default: /* if more than one exception exists, a sledgehammer is viable */
> >>        setsr(getsr() | ((unsigned int) excepts));
> >>        break;
> >>    }
> >>    return(0);
> >>}
> >
> >This is probably ok (aside from style things mentioned above), but
> >simply recording the flags in software without writing them to the
> >status register at all may be preferable. In that case we would
> >not even need a set_sr primitive, only get_sr and clear_sr.
> 
> Yes. I like the idea. But I was deliberately not trying to change
> anything external to 'fenv.c'.

Indeed, I think if we make that change it would be nice to factor it
as a separate change later. I do think I want to make it, though,
since it will enable us to add fenv support for soft-float archs
(which will be fully working if we hook up the FP_HANDLE_EXCEPTIONS
macro in libgcc soft-fp.h to call feraiseexcept!)

> Also, is the handling of
> __pthread_self() protected
> into the future?

I would put it as a macro in the internal fenv header so it could be
changed easily if needed, something like:

#define SOFT_FPSR (__pthread_self()->soft_fpsr)

> >Doing this would also simplify the ppc special-casing for FE_INVALID I
> >think..
> 
> I think only for feraiseexcept as you show.  It does not simplify
> feclearexcept as far as I can see.

feclearexcept would do:

	soft_fpsr = (soft_fpsr | get_sr()) & ~except;
	clear_sr();

At that point, FE_INVALID is clearable independent of the
specific-reason flags and doesn't seem to need any special treatment.

Note that just using clear_sr() rather than set_sr() here is a huge
optimization on i386 too -- it avoids the need to read-modify-write
the full fenv.

> >>int
> >>feraiseexcept(int excepts)
> >>{
> >>    FE_VALIDATE_EXCEPT(excepts);
> >>    FE_QUALIFY_RAISE_EXCEPT(excepts);
> >>    return __raisearithmetically(excepts);
> >>}
> >
> >Then this could just be __pthread_self()->fpsr |= excepts;
> 
> Very clean and simple. Is it portable outside of Linux? Not that this is
> relevant for MUSL which is designed for Linux-based devices but I do work
> on non-Linux (and no C/C++) systems so I am curious.

It's portable to anywhere you have thread-local storage, which should
be anywhere nowadays. You'd just define it as a tls var instead of
using the __pthread_self() access method (which is mildly more
efficient but mainly avoids depending on tls tooling and some issues
with early init of the dynamic linker). Something like:

_Thread_local unsigned __soft_fpsr;
#define SOFT_FPSR __soft_fpsr

> >>int
> >>fesetenv(fenv_t *envp)
> >>{
> >>    fenv_t envpd = FE_DFL_ENV_DATA, *e = envp == FE_DFL_ENV ? &envpd : envp;
> >>
> >>    return ((sete(envp)), 0);
> >>}
> >
> >It might be preferable to use a static const object for default env;
> >I'm not sure. Not a big deal either way though.
> 
> I thought of that. I used to always do that until 5 years ago until
> I found that an auto variable would often produce more optimal code.
> It
> needs to be checked but it is irrelevant to the fundament design issues.

I think you're probably right on that.

> Ongoing:
> 
> I will incorporate your suggestions except for the __pthread_self()->fpsr
> tweaks which I like but cannot test across multiple O/S's and then we can
> review this.  Should I repost to make sure I got everything and we have a
> reference point?
> 
> I will post some inline assembler for your review and entertainment
> but in the spirit of a generic template, they should be reviewed in
> isolation.
> 
> Also, I notice that on the mips, the single unsigned in an 'fenv_t'
> is encapsulated in a struct. Is this historical, mandatory,
> stylistic, a way to fix a compiler bug, compatability or something
> else? I also notice that there is a subtle difference in the
> 'fegetenv' assembler between

You should consider arch/*/bits/fenv.h as essentially immutable; the
types and macros there are ABI or API conventions. The arch-specific
definitions for fenv stuff should go in arch/$(ARCH)/fenv_arch.h.

> 	diff mips/*.S mips64/*.S
> 
> 	64c64
> 	< 	addiu	$5, $4, 1
> 	---
> 	> 	daddiu	$5, $4, 1
> 
> I guess this is just struct alignment issues? Sorry, it is 20 years
> since I worked with the assembler on a MIPS so my knowledge is
> really rusty.

The code there is comparing the pointer in the argument (register $4)
with (void*)-1 by adding 1 and checking whether the result is 0. For
64-bit arch, it has to do a 64-bit add; otherwise the pointer would be
truncated to 32-bit. This kind of nastiness in the asm source files is
the whole reason I want the high-level logic moved to C.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.