musl - Re: Considering x86-64 fenv.s to C

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.02.2001171401570.27694@key0.esi.com.au>
Date: Fri, 17 Jan 2020 14:36:20 +1100 (AEDT)
From: Damian McGuckin <damianm@....com.au>
To: musl@...ts.openwall.com
Subject: Re: Considering x86-64 fenv.s to C


Feedback/Discussion please, especially in terms of what extra comments I 
need to make?  I hope I have not missed anything.

General Comments
****************

Except where noted, the approach taken to invalid input is to mask out the 
invalid data, use what data is left, and never inform the calling program 
of invalid data.

The i386(sometimes), X32 and X86-64 generally need to realise that they 
have both the X87 FPU and the SSE.  Are there scenarios where this will 
not be the case or do we need to plan for future scenarious where this 
will not be the case?

Do we need to consider what is in the latest IEEE 754 2019 standard to see 
what enhancements are needed or just wait for C2X?

Other Architectures
*******************

Should we look at what is needed for Sparc and Power9 to ensure that the 
(eventually-) chosen abstraction will work with these? Are there any other 
chips which need to be considered. If you look at more recent chipset 
designs, they have all been able to leverage the experience of working 
with IEEE 754 exceptions and rounding and follow the same style of use of 
an exception status and round control register . So I think catering for 
the current crop, plus those 2 mentioned above, should be adequate. But
am I wrong?

Is Power9 the same as PowerPC64?  I have never seen one. I know I do not 
know enough about this chip as the 128-bit floating point discussion talks 
about Rounding-To-Odd mode? I have tried to read the 1358 pages of the ISA 
3.0 architecture manual but I have a long way to go before I know even 10% 
of what is in there. Are the newer beefy ARMS likely to change what they
do not in the context of 'fenv' routines?

Also, and I could be wrong, currently MUSL assumes that there is an 
integral type for every floating type.  On some architectures, I believe 
this is not always the case for 128-bit floating point numbers. On some 
Sparcs, I am not sure it was even the case for 64-bit numbers but that was 
a long time. I do not think that this restriction will influence anything 
here.  How it affects MUSL in general is another question irrelevant to 
this discussion.

Summary
*******

aarch64 (arm)

*	All assembler

arm (bare)

*	Empty

i386

*	All assembler

*	The fldenv instruction to update the status registers has a serious
 	overhead which cannot be avoided in 'feraiseexcept'. No attempt is
 	made to optimize any unnecessary usage (as occurs in feclearexcept).
 	Note that fldenv also makes the 'feclearexcept' routine unavoidably
 	complex.

*	What is the best way to query '__hwcap' from inline __asm__ statement,
 	specifically to verify if SSE instructions have to be supported

m68k

*	In C.

*	Very clear

*	feclearexcept and feraiseexcept

 		if (exception_mask & ~FE_ALL_EXCEPT) return (-1)

 	Different to the way others handle invalid input. Is this cast
 	behaviour cast in stone based on standard documentation?

mips/mips64/mipsn32

*	All assembler

*	Not overly complex.

powerpc

*	All assembler

*	I think that this architecture has more exception bits than IEEE 754
 	specifies. It has lots of specific cases of FE_INVALID. This needs
 	to be considered when dealing with FE_INVALID.

*	My knowledge of this assembler is poor. Please expand these comments!!

powerpc64

*	In C.

*	Very clear

*	Note that this architecture has more exception bits than IEEE 754
 	specifies. It has lots of specific cases of FE_INVALID. This needs
 	to be considered when dealing with FE_INVALID.

*	This is the first time I have seen this style of coding to cast a
 	double to a union and then extract the data as a long.

 		return (union {double f; long i;}) {get_fpscr_f()}.i;

 	Is this style of coding universally accepted within MUSL? From my
 	reading of other routines, it is normally done as

 		union {double f; long i;} f = { get_fpscr_f() };

 		return f.i;

 	Just curious.

riscv64

*	All assembler.

*	Very clear.

*	The architecture has obviously been done after a review of lots
 	of experience with the IEEE 754 standard.

s390x

*	In C.

*	Very clear.

*	Why is __fesetround(int) 'hidden'? Where is fesetround()?

sh (SuperH??)

*	In assembler

*	I know zero about this assembler

*	There is some pecularity about updating the environment. I have no
 	idea what is going on here. Anybody clear to elaborate?

x32

*	In assembler

*	Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?

*	It is this really much the same as x86-64 (or am I wrong)?

x86_64

*	In assembler

*	Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?

*** FINISH
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.