Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.02.2001240755500.28245@key0.esi.com.au>
Date: Fri, 24 Jan 2020 11:42:02 +1100 (AEDT)
From: Damian McGuckin <damianm@....com.au>
To: musl@...ts.openwall.com
Subject: Re: Considering x86-64 fenv.s to C


In an attempt to address this issue, the following can be considered an 
attempt at a 'discussion paper'.

The following are assumptions. Whether the assumptions are valid enough
is debatable and open for discussion by virtue of this initial post. I
will put the generic details and once there is some agreement to that,
the details can be attacked, including nomenclature.  The names are not
locked in stone. They are just what is being used to illustrate the design.
Comments relating to accepted practice in MUSL are welcome.

These assumptions do not hold for i386, x32, and x86_64, for reasons that 
are largely a need to address their historical origins which at the time, 
were quite ground breaking. The 3 architectures need their own special 
handling although background research suggest that all three can be made 
generic with respect to each.  But that is another story.

I have written this at a level that was meant to be read by those with a
low level of knowledge. That may be a mistake but it meant that I could
get some level of initial peer review before posting this to the list!

Ignoring Intel FPUs for the moment at least, it is assumed that all FPUs 
expose their state to, and allow control of themselves by, user programs 
through

a) a status register which contains bits which are set when a floating
    point exception occurs, the exception bits, and
b) a control register which contains a bit field in which is stored the
    currently active rounding mode.

These registers may be one and the same.  It is assumed that they can be
stored in, or loaded from an unsigned int (of 32-bits).  This appears to
be the case for all considered architectures (see the list below).

The interrogation and modification of both the status register and the
control register can be done via the respective pair of 'routines'

 	getsr/setsr
and
 	getcr/setcr

which respectively copies the register's contents into a local variable,
an unsigned int, and sets the register's contents from its unsigned int
argument.

These are simple #defines, being mapped to

a)	if the status and control bits are in the same register, one of

 	   static inline unsigned int get_csr() { .... }
 	   static inline void set_csr(unsigned int csr) { .... }

 	where get_csr is mapped to both 'getsr' and 'getcr' and

a)	if the registers are in fact distinct, one

 	    static inline unsigned int get_sr() { .... }
 	    static inline void set_sr(unsigned int csr) { .... }
 	    static inline unsigned int get_cr() { .... }
 	    static inline void set_cr(unsigned int csr) { .... }

There exists a 'const' default floating point environment which can be
defined as a local variable as:

      const fenv_t fe_dfl_env = FE_DFL_ENV_DATA;

where FE_DFL_ENV_DATA is a #define which is created automatically if the
variable 'fe_dfl_env' is an unsigned int.

There also exists a pair of macros, 'gete' and 'sete', and an assumed
memory object '*e' which are defined such that

 	#define gete(e)	... // extracts the FPU's environment into *e

 	#define sete(e)	... // replaces the FPU's environment with *e

 	fenv_t *e;

Some examples of these architecture dependent routines and the associated 
definitions were crafted for m68k, powerpc64, and s390x, by copying the 
appropriate bits from how 'musl' currently does it.  An approximation for 
examples for sparc64, mips64 and sh is provides, remembering that these 
are approximations only.  The implementation for these cases is likely to 
be at least less than optimal if not wrong.  Please amend as appropriate.

Also, at least one architecture has an expanded idea of the concept of an
INVALID exception. For such scenarios, 'feclearexcept' and 'feraiseexcept'
need macros to be defined which massage the exceptions bits 'excepts' that
come a user's program, respectively

     #define FE_QUALIFY_CLEAR_EXCEPT(excepts)	....
and
     #define FE_QUALIFY_RAISE_EXCEPT(excepts)	....

For any other scenarios, these macros never need to be defined explicitly,
the code will define them as empty by itself.
All routines validate the exceptions bits 'excepts' coming a user's program.
This is achieve with a macro, the default of which is

     #define FE_VALIDATE_EXCEPT(except)	except &= FE_ALL_EXCEPTS

This simply discards any bits which are not a valid exception bit. It is
created automatically and does not need to be defined. However, where such
an approach for the case of illegal exceptions is deemed to unsatisfactory
and for example routine failure and a non-zero return is deems a better
approach, that macro can redefined explicitly as

    #define FE_VALIDATE_EXCEPT(except) if ((except) & FE_ALL_EXCEPTS) return(-1)

Historically it seems only one architecture took this approach, 'm68k',
and then only on Linux.  On any other architecture, and on *BSD, the
earlier, the default approach is used.

An attempt at an implementation is shown below, WITHOUT the architecture 
dependent macro definitions and 'static inline' routines which largely 
just comprise one line of embedded assembler.

Note that raising exceptions is done with arithmetic expression for trivial
cases and by directly modifying the floating point status register for the
non-trivial case.  The approach is open to question. While it allows testing
in isolation, it does not use any of the latest barrier function approaches
and routines like 'math_uflow' and colleagues.  It might be advantages to
focus on the overall design first and then discuss how to raise exceptions
later so as to not detract from the discussion of the overall design of the
generic interface.

Please consider:

/*
  * default inspection and modification of FPU status and control registerd
  */
#ifndef getcr
#define getcr   get_csr
#endif
#ifndef setcr
#define setcr   set_csr
#endif
#ifndef getsr
#define getsr   get_csr
#endif
#ifndef setsr
#define setsr   set_csr
#endif
#ifdef FE_DFL_ENV_DATA
#define FE_DFL_ENV_DATA 0	/* assume this is a unsigned int - cast?? */
#endif
#ifndef gete
#define gete(e) *e = get_csr()	/* assume this is a basic type */
#endif
#ifndef sete
#define sete(e) set_csr(*e)	/* assume this is a basic type */
#endif
#ifndef FE_ALL_EXCEPT
#define FE_VALID_EXCEPT (FE_INEXACT|FE_DIV_BY_ZERO|FE_UNDERFLOW|FE_OVERFLOW)
#define FE_ALL_EXCEPT   (FE_INVALID|FE_VALID_EXCEPT)
#endif
/*
  * The usual approach to validating an exception mask is to throw out
  * anything which is illegal. Some systems will NOT do this, choosing
  * instead to return a non-zero result on encountering an illegal mask
  */ 
#ifndef FE_VALIDATE_EXCEPT
#define FE_VALIDATE_EXCEPT(e)   e &= FE_ALL_EXCEPT
#endif
/*
  * There is usually no need to mess the generic valid exceptions bits
  * given to 'feclearexcept' and 'feraiseexcept' so define a 'nothing'
  * macro for such a scenario - is this the MUSL accepted style????
  */
#ifndef FE_QUALIFY_CLEAR_EXCEPT
#define FE_QUALIFY_CLEAR_EXCEPT(excepts)    (0)
#endif
#ifndef FE_QUALIFY_RAISE_EXCEPT
#define FE_QUALIFY_RAISE_EXCEPT(excepts)    (0)
#endif
/*
  * Compute the rounding mask with some rigid logic
  */
#ifndef FE_TO_NEAREST
#define FE_TO_NEAREST   0
#define ROUND_MASK  ((unsigned int) 0)
#else
#ifndef FE_TOWARDS_ZERO
#define ROUND_MASK  ((unsigned int) 0)
#else
#ifdef  FE_DOWNWARD
#ifdef  FE_UPWARD
#define ROUND_MASK  ((unsigned int) (FE_DOWNWARD|FE_UPWARD|FE_TOWARDZERO))
#else
#define ROUND_MASK  UNSUPPORTED rounding
#endif
#else
#ifdef  FE_UPWARD
#define ROUND_MASK  UNSUPPORTED rounding
#else
#define ROUND_MASK  ((unsigned int) (FETOWARDZERO))
#endif
#endif
#endif
#endif

int
feclearexcept(excepts)
{
     FE_VALIDATE_EXCEPT(excepts);
     FE_QUALIFY_CLEAR_EXCEPT(excepts);
     setsr(getsr() & ~((unsigned int) excepts));
     return(0);
}

static inline int
__raisearithmetically(int excepts)
{
     /*
      * assume single OP is faster than double OP
      */
     const float one = (float) 1;
     const float zero = (float) 0;
     const float tiny = (float) 0x1.0p-126;
     const float huge = (float) 0x1.0p+126;
     volatile float x;

     /*
      * if it is just a simple exception, arithmetic expressions are optimal
      */
     switch(excepts)
     {
     case FE_INVALID:
         x = zero, x /= x;
         break;
     case FE_DIVBYZERO:
         x = zero, x = one / x;
         break;
     case FE_INEXACT:
         x = tiny, x += one;
         break;
     case (FE_OVERFLOW | FE_INEXACT):
         x = huge, x *= x;
         break;
     case (FE_UNDERFLOW | FE_INEXACT):
         x = tiny, x *= x;
         break;
     default: /* if more than one exception exists, a sledgehammer is viable */
         setsr(getsr() | ((unsigned int) excepts));
         break;
     }
     return(0);
}

int
feraiseexcept(int excepts)
{
     FE_VALIDATE_EXCEPT(excepts);
     FE_QUALIFY_RAISE_EXCEPT(excepts);
     return __raisearithmetically(excepts);
}

int
fetestexcept(int excepts)
{
     FE_VALIDATE_EXCEPT(excepts);

     return (int) (getsr() & ((unsigned int) excepts));
}

int
fegetround(void)
{
     return (int) (getcr() & ROUND_MASK);
}

int
fesetround(int rounding_mode)
{
     if ((rounding_mode & ~ROUND_MASK) == 0)
     {
         unsigned int mode = ((unsigned int) rounding_mode);

         return (setcr((getcr() & ~ROUND_MASK) | (mode & ROUND_MASK)), 0);
     }
     return(-1);
}

int
fegetenv(fenv_t *envp)
{
     return ((gete(envp)), 0);
}

int
fesetenv(fenv_t *envp)
{
     fenv_t envpd = FE_DFL_ENV_DATA, *e = envp == FE_DFL_ENV ? &envpd : envp;

     return ((sete(envp)), 0);
}

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.