|
Message-ID: <20150624063923.GR1173@brightrain.aerifal.cx> Date: Wed, 24 Jun 2015 02:39:23 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: [PATCH 2/5] dynlink.c: compute modulus via magic multiplication On Wed, Jun 24, 2015 at 09:08:15AM +0300, Alexander Monakov wrote: > > > How can I do the last step, 'x-v*div' without it? > > > > Ah yes. Would it be preferable to have that in struct udiv then, > > though? Then the caller never has to care about the divisor, and in > > the case where umod doesn't get inlined, the call will be more > > efficient (fewer args). > > I doubt that; the caller needs that value ('nbuckets') anyway. OK. > [...] > > Using the post-mul add rather than the saturated increment would make > > this work for 0xffffffff too. > > post-mul add is problematic on 32-bit Maybe. It's not clear to me which is cheaper, a saturating multiply (well-predicted branch) or an adc. IMO the only place the latter is clearly more costly is on archs with no proper carry mechanism. > > Another obvious solution is not using the +32 offset so that the right > > shift can just be 31, but that pessimizes the code on 32-bit archs > > quite a bit. > > I agree that a special-case for a power-of-two divisor will fire too rarely, > so figuring a replacement out would be nice. > > If you (or anyone) want to play with ideas, I'm attaching my test driver for > magic division that tests 2^32-1 inputs for a divisor given on the command > line (the main loop looks odd, but my goal was to have gcc vectorize it). Nice. > > > p->s1 check is for this reason: shift are relatively costly, and s1 is rarely > > > non-zero, so try to skip the shift if possible; in the rare case the pre-shift > > > is non-zero, the check allows to skip the saturating increment operation. > > > > Shifts are costly? Are we talking about P4-era junk? ;-) > > I primarily had modern Intel cores in mind where as far as I can see shifts by > %ecx have latency 2. But even ignoring that, there's the second part of my > argument. With a latency of 2, I can't imagine there being any benefit to attempting to avoid it with a conditional branch. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.