|
Message-ID: <20211226204238.GA1949@voyager> Date: Sun, 26 Dec 2021 21:42:38 +0100 From: Markus Wichmann <nullplan@....net> To: musl@...ts.openwall.com Subject: ASM-to-C conversion for i386 Hi all, merry Christmas, everyone. I hope you survived the various family visitations in good health and are slowly coming out of the food coma, or whatever your anual rituals are. Anyway, I found myself with a bit of time on my hands and chose to be productive for once. Rich made some noise however long ago that he wanted to move from assembly source code files to C source code files with inline assembly. So I looked at what I could contribute to that cause. This is hindered somewhat by the fact that my knowledge of assembler is restricted to x86, PowerPC, and Microblaze. And for Microblaze, it has been a while since I've used it.. For ARM and most of the others, I can get the gist, but there may be subtleties I am not grasping, and that is precisely what we cannot use for such a conversion. So I decided to start with the architecture I am most familiar with: i386. And now I am finished with the largest part of it, the maths code. That is, finished with the first pass. You can follow the progress here: https://github.com/nullplan/musl/tree/asm2c So I've converted __set_thread_area(). That was pretty straightforward once I found SYSCALL_NO_TLS. The generated assembly generated by clang 6.0.0 hits the same notes as the handwritten code, so I'm willing to count that as a win. For the maths code, I've added the likely() and unlikely() macros to libm.h. Not sure if they belong there, but they do make the generated assembly more similar to the handwritten code. Most of that code was straightforward, but some of the more complex functions I am not sure about. What is up with __exp2l()? I can see that expl() is calling it, but I'm not sure why. But its existence forced me to employ a technique not used elsewhere in the code (that I could find): A hidden alias. I vaguely recall that such hackery was rejected before (on grounds of old binutils reacting badly to such magic), but I don't really know what else I could have done. Or was the correct way to make __exp2l() a hidden function with the actual implementation and exp2l() (without the underscores) a weak alias? Anyway, the maths code suffers from massive code duplication on both assembler and C levels. Not sure what to do about it, though. In many cases, each of the three versions of a function only differ in the fine details, but clang being as inline happy as it is means that many techniques to reduce code duplication in C cause bloated object files in assembler. For example, all functions of the floor, ceil, and trunc families have been implemented in floor.c, in terms of a new static function I called "rndint()", containing the heart of what used to be at label 1 in floor.s. Unfortunately, after compiling, clang has inlined rndint() every time, so that floor.o contains all nine functions, and all functions are substantially copies of rndint(). The only solution I would see to that would have been to rename "rndint()" to something with a double underscore at the start, make it hidden and extern, and move all the functions into their own files, thus preventing inlining and making the object files more modular. Not sure how you'd like it. Also, the generated assembly tends to use more memory. It appears that clang is hesitant to overwrite memory allocated to a variable, even if that variable is currently parked in a register. Or maybe my clang version is just weird. That also explains why it sometimes emits "fld" instructions in the wrong order and then fixes the mistake with "fxch". Not a huge deal, just weird. Nothing forces the wrong order. And the order is often correct in the smaller precision versions of the same function. Many of the maths functions are testing if their argument is subnormal, and return an underflow exception if so and the argument is not zero. For the single-precision case, the idiom used was to square the input, which I have recreated with FORCE_EVAL(). For the double-precision case, however, it was to store the variable as single precision. Finally, I have also converted fenv.s today. I was hesitant to do that at first, since a general C framework for fenv is under development, but it has been quite a while since I've heard a peep from that project. In any case, since their code should overwrite all of the existing fenv code, a merge would now just lead to trivial path conflicts that are easily resolved. I believe in doing the conversion, I found a bug in feclearexcept(). The original code said in the non-SSE version (context: EAX contains the status word, ECX contains the function argument, and "1b" is a function return) | test %eax,%ecx | jz 1b | not %ecx | and %ecx,%eax | test $0x3f,%eax | jz 1f | fnclex | jmp 1b |1: sub $32,%esp | fnstenv (%esp) | mov %al,4(%esp) | fldenv (%esp) | add $32,%esp | xor %eax,%eax | ret That second "jz" confuses me. The intent seems to be to test if any exceptions remain, and use "fnclex" if not. That would make sense, since "fnclex" clears all exceptions. But since the second "jz" is a "jz" and not a "jnz", the "fnclex" path is used only if exceptions remain, and the slower "fldenv" path is used if none remain. Or am I reading this wrong? Anyway, I implemented the logic that made sense to me in the C version. What remains to be done? Well, looking at the list of assembler files, the only targets for a C conversion that remain (in i386) are the string functions. After that, it is time to clean up and submit patches. Speaking of, how would you like those? One patch for everything, one patch per directory (i.e. one for thread, one for math, one for fenv, one for string), or one per functions group (the three precisions of each function), or one per function? I don't want to overwhelm you. Ciao, Markus
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.