|
Message-ID: <alpine.LRH.2.02.2001241227030.6298@key0.esi.com.au> Date: Fri, 24 Jan 2020 15:13:32 +1100 (AEDT) From: Damian McGuckin <damianm@....com.au> To: musl@...ts.openwall.com Subject: Re: Considering x86-64 fenv.s to C Let me know if I missed something in the following: On Thu, 23 Jan 2020, Rich Felker wrote: >> #ifndef FE_ALL_EXCEPT >> #define FE_VALID_EXCEPT (FE_INEXACT|FE_DIV_BY_ZERO|FE_UNDERFLOW|FE_OVERFLOW) >> #define FE_ALL_EXCEPT (FE_INVALID|FE_VALID_EXCEPT) >> #endif > > I don't understand what this is for. FE_ALL_EXCEPT is defined in the > public fenv.h and FE_VALID_EXCEPT is not used. For the first question, it was done purely to avoid the need to define FE_ALL_EXCEPT in all of the various 'fenv.h' files. I am happy to throw it away. For the second question, at one stage, I considered having the concept of what was anything except an INVALID exception. I will remove those 4 lines. >> /* >> * The usual approach to validating an exception mask is to throw out >> * anything which is illegal. Some systems will NOT do this, choosing >> * instead to return a non-zero result on encountering an illegal mask >> */ > > The latter is not desirable behavior and not something you need to be > trying to reproduce. It's a bug or at least an unspecified case we > don't care about in existing code. Wonderful. I was purely trying to preserve historical performance. I am more than happy to NOT do this. That makes the next problem easy to fix. >> #ifndef FE_VALIDATE_EXCEPT >> #define FE_VALIDATE_EXCEPT(e) e &= FE_ALL_EXCEPT >> #endif > > Write things like this directly in code; don't hide them behind > macros. Doing that just makes the code hard to read because you can't > tell what it's doing without searching for the macro definition. (Also > the macro is not protected by parens as written but that's easily > fixable.) Because we can ignore that non-zero return case, we no longer need an architecture dependent macro for this so it can go directly in the code. So, again, an easy problem to fix. >> /* >> * There is usually no need to mess the generic valid exceptions bits >> * given to 'feclearexcept' and 'feraiseexcept' so define a 'nothing' >> * macro for such a scenario - is this the MUSL accepted style???? >> */ >> #ifndef FE_QUALIFY_CLEAR_EXCEPT >> #define FE_QUALIFY_CLEAR_EXCEPT(excepts) (0) >> #endif >> #ifndef FE_QUALIFY_RAISE_EXCEPT >> #define FE_QUALIFY_RAISE_EXCEPT(excepts) (0) >> #endif > > Can you clarify how these would be defined for powerpc? Could the logic > just be embedded in get_sr/set_sr if needed? Address the second question first, my original aim was to keep any logic out of where embedded assembler was used, mainly because of my own very irregular experience with it. I could also not convince myself that get_sr and set_sr may need to know from where they are called. In the case of the powerpc64, with the SR and CR being the same, that adds to the complexity, so I gave up. Answering your first question, for the powerpc64. #define QUALIFY_CLEAR_EXCEPT(e) if (e & FE_INVALID) e |= FE_ALL_INVALID #define QUALIFY_RAISE_EXCEPT(e) if (e & FE_INVALID) e |= FE_INVALID_SOFTWARE An alternative is to avoid these macros and then #ifndef FE_ALL_INVALID #define FE_ALL_INVALID 0 #endif #ifndef FE_INVALID_SOFTWARE #define FE_INVALID_SOFTWARE 0 #endif and have respectively the following code in feclear.. and feraise.. if (excepts & FE_INVALID) excepts |= FE_ALL_INVALID; and if (excepts & FE_INVALID) excepts |= FE_INVALID_SOFTWARE; An optimize should see the OR with zero for most architectures and then ignore that whole 'if' statement because the action was a NO-OP. > >> /* >> * Compute the rounding mask with some rigid logic >> */ >> #ifndef FE_TO_NEAREST >> #define FE_TO_NEAREST 0 >> #define ROUND_MASK ((unsigned int) 0) >> #else >> #ifndef FE_TOWARDS_ZERO >> #define ROUND_MASK ((unsigned int) 0) >> #else >> #ifdef FE_DOWNWARD >> #ifdef FE_UPWARD >> #define ROUND_MASK ((unsigned int) (FE_DOWNWARD|FE_UPWARD|FE_TOWARDZERO)) >> #else >> #define ROUND_MASK UNSUPPORTED rounding >> #endif >> #else >> #ifdef FE_UPWARD >> #define ROUND_MASK UNSUPPORTED rounding >> #else >> #define ROUND_MASK ((unsigned int) (FETOWARDZERO)) >> #endif >> #endif >> #endif >> #endif > > I don't see why you have "UNSUPPORTED rounding" cases here. Any > combination should be valid. My deliberate assumption was that the only legal combinations were round to nearest only or round to nearest and truncate only or round to nearest, truncate, and round to either +/- INFINITY Whether that assumption is valid is another thing but I am happy to be proven wrong. If somebody defined an architecture where that assumption was not valid, I wanted the compiler to complain at me. > Since these are not bitmasks, probably the arch should just have to > define the rounding mode mask. For example if the values were 0b000, > 0b001, 0b100, and 0b101, but the 0 in the middle was significant, it > should probably be treated as part of the mask. That was not the way I read it but again, I am happy to be proven wrong. If the net OR'ing of the various types of rounding cannot be treated as a mask, then every *BSD libc implementations that I know will need to be fixed because they do treat it that way. I did run tests and none of the ones I tested from BSD and MUSL had that structure. Maybe my test code was wrong. They all seems to be 2 bits where TO_NEAREST was all bits zero and one of the others was the some of the other two. > If you really want to generate the mask, something like > > #ifdef FE_TONEAREST > #define FE_TONEAREST_MASK FE_TONEAREST > #else > #define FE_TONEAREST_MASK 0 > #endif > ... > #define ROUND_MASK (FE_TONEAREST_MASK | FE_DOWNWARD_MASK | ...) > > should be simpler and comprehensive. I will run a test but I think they produce the same result for all the architecture for which I have definition files. >> int >> feclearexcept(excepts) >> { >> FE_VALIDATE_EXCEPT(excepts); >> FE_QUALIFY_CLEAR_EXCEPT(excepts); >> setsr(getsr() & ~((unsigned int) excepts)); >> return(0); >> } > > No function-like parens around return value, no gratuitous casts. No problems. Sorry, gratuitous casting is a habit to stop various linters complaining. I went looking for a MUSL style guide but could not find one. >> static inline int >> __raisearithmetically(int excepts) >> { >> /* >> * assume single OP is faster than double OP >> */ >> const float one = (float) 1; >> const float zero = (float) 0; >> const float tiny = (float) 0x1.0p-126; >> const float huge = (float) 0x1.0p+126; >> volatile float x; >> >> /* >> * if it is just a simple exception, arithmetic expressions are optimal >> */ >> switch(excepts) >> { >> case FE_INVALID: >> x = zero, x /= x; >> break; >> case FE_DIVBYZERO: >> x = zero, x = one / x; >> break; >> case FE_INEXACT: >> x = tiny, x += one; >> break; >> case (FE_OVERFLOW | FE_INEXACT): >> x = huge, x *= x; >> break; >> case (FE_UNDERFLOW | FE_INEXACT): >> x = tiny, x *= x; >> break; >> default: /* if more than one exception exists, a sledgehammer is viable */ >> setsr(getsr() | ((unsigned int) excepts)); >> break; >> } >> return(0); >> } > > This is probably ok (aside from style things mentioned above), but > simply recording the flags in software without writing them to the > status register at all may be preferable. In that case we would not even > need a set_sr primitive, only get_sr and clear_sr. Yes. I like the idea. But I was deliberately not trying to change anything external to 'fenv.c'. Also, is the handling of __pthread_self() protected into the future? > Doing this would also simplify the ppc special-casing for FE_INVALID I > think.. I think only for feraiseexcept as you show. It does not simplify feclearexcept as far as I can see. >> int >> feraiseexcept(int excepts) >> { >> FE_VALIDATE_EXCEPT(excepts); >> FE_QUALIFY_RAISE_EXCEPT(excepts); >> return __raisearithmetically(excepts); >> } > > Then this could just be __pthread_self()->fpsr |= excepts; Very clean and simple. Is it portable outside of Linux? Not that this is relevant for MUSL which is designed for Linux-based devices but I do work on non-Linux (and no C/C++) systems so I am curious. >> int >> fegetround(void) >> { >> return (int) (getcr() & ROUND_MASK); >> } >> >> int >> fesetround(int rounding_mode) >> { >> if ((rounding_mode & ~ROUND_MASK) == 0) >> { >> unsigned int mode = ((unsigned int) rounding_mode); >> >> return (setcr((getcr() & ~ROUND_MASK) | (mode & ROUND_MASK)), 0); >> } >> return(-1); >> } > > I would put the error test at the beginning rather than putting the > whole body inside {}: > > if (rounding_mode & ~ROUND_MASK) return -1; Happy to. A while ago, I found that the alternative produces more optimal code when dealing with floating point dominant routines. But that is huge generalization and I have not tested that generalization in this case. >> int >> fegetenv(fenv_t *envp) >> { >> return ((gete(envp)), 0); >> } > > Is there a reason for using comma operator here? I used it to let the compiler prove that my #defines for gete() and sete() worked as if they were a single statement. That gave me some rigour on the design of that macro, which in the case of the m68k (and eventually the i386), does demand a comma operator. But for the production code, it will disappear. >> int >> fesetenv(fenv_t *envp) >> { >> fenv_t envpd = FE_DFL_ENV_DATA, *e = envp == FE_DFL_ENV ? &envpd : envp; >> >> return ((sete(envp)), 0); >> } > > It might be preferable to use a static const object for default env; > I'm not sure. Not a big deal either way though. I thought of that. I used to always do that until 5 years ago until I found that an auto variable would often produce more optimal code. It needs to be checked but it is irrelevant to the fundament design issues. Ongoing: I will incorporate your suggestions except for the __pthread_self()->fpsr tweaks which I like but cannot test across multiple O/S's and then we can review this. Should I repost to make sure I got everything and we have a reference point? I will post some inline assembler for your review and entertainment but in the spirit of a generic template, they should be reviewed in isolation. Also, I notice that on the mips, the single unsigned in an 'fenv_t' is encapsulated in a struct. Is this historical, mandatory, stylistic, a way to fix a compiler bug, compatability or something else? I also notice that there is a subtle difference in the 'fegetenv' assembler between diff mips/*.S mips64/*.S 64c64 < addiu $5, $4, 1 --- > daddiu $5, $4, 1 I guess this is just struct alignment issues? Sorry, it is 20 years since I worked with the assembler on a MIPS so my knowledge is really rusty. Regards - Damian Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.