I am trying to write inline x86-64 assembly for GCC to efficiently use the MULQ instruction. MULQ multiplies the 64-bit register RAX with another 64-bit value. The other value can be any 64-bit register (even RAX) or a value in memory. MULQ puts the high 64 bits of the product into RDX and the low 64 bits into RAX.
Now, it's easy enough to express a correct mulq as inline assembly:
#include
static inline void mulq(uint64_t *high, uint64_t *low, uint64_t x, uint64_t y)
{
asm ("mulq %[y]"
: "=d" (*high), "=a" (*low)
: "a" (x), [y] "rm" (y)
);
}
This code is correct, but not optimal. MULQ is commutative, so if y
happened to be in RAX already, then it would be correct to leave y
where it is and do the multiply. But GCC doesn't know that, so it will emit extra instructions to move the operands into their pre-defined places. I want to tell GCC that it can put either input in either location, as long as one ends up in RAX and the MULQ references the other location. GCC has a syntax for this, called "multiple alternative constraints". Notice the commas (but the overall asm() is broken; see below):
asm ("mulq %[y]"
: "=d,d" (*high), "=a,a" (*low)
: "a,rm" (x), [y] "rm,a" (y)
);
Unfortunately, this is wrong. If GCC chooses the second alternative constraint, it will emit "mulq %rax". To be clear, consider this function:
uint64_t f()
{
uint64_t high, low;
uint64_t rax;
asm("or %0,%0": "=a" (rax));
mulq(&high, &low, 7, rax);
return high;
}
Compiled with gcc -O3 -c -fkeep-inline-functions mulq.c
, GCC emits this assembly:
0000000000000010 :
10: or %rax,%rax
13: mov $0x7,%edx
18: mul %rax
1b: mov %rdx,%rax
1e: retq
The "mul %rax" should be "mul %rdx".
How can this inline asm be rewritten so it generates the correct output in every case?
No comments:
Post a Comment