What is the "smart way" to implement LMUL ?
For an example say that I would like to calculate
signed_int128 * signed_int128 -> signed_int128
Or a 128bit MAC
signed_int64 * signed_int64 + signed_int128 -> signed_int128 as fast as possible.
Using the LMUL macro in XC, How can I write a fast loop that multiplies integers with any length stored in some type of array or struct.
I checked the simulator output in debugger mode.
long long A,B,C;
A*B
A*B+C
it starts with LMUL and LADD
but it uses
MUL
ADD
MUL
ADD
calculating the most significant part. Could it use MAC or LMUL instead, or is is some magic with the sign ?
Somehow it handles -1*1 = -1 and -1 *-1 = 1 ... and I find that very nice :mrgreen: and you can see the magic happen looking at the registers in the simulations with instruction stepping.
I understand the thing with circular math, making the ADD instruction functional both for signed and unsigned - but I haven't understood the math behind signed and unsigned multiplication.
LMUL and signed multiplication
-
- XCore Expert
- Posts: 956
- Joined: Fri Dec 11, 2009 3:53 am
- Location: Sweden, Eskilstuna
LMUL and signed multiplication
Probably not the most confused programmer anymore on the XCORE forum.
-
- XCore Expert
- Posts: 956
- Joined: Fri Dec 11, 2009 3:53 am
- Location: Sweden, Eskilstuna
Making a signed 96-bit MAC:
int64(int Bh,uint Bl) * int32(int Ah) + int96(int Ch, uint Cm,uint Cl) => int96(int yh,uint ym,uint yl)
Something like this below?? Using the macs instruction in the end, handeling the sign of A and B.
and with an example:
giving the console output
20000001 (yh)
80000003 (ym)
3 (yl)
0xA * 0xB + 0xC= 10000000 80000001 00000000 + 10000001 00000002 00000003
int64(int Bh,uint Bl) * int32(int Ah) + int96(int Ch, uint Cm,uint Cl) => int96(int yh,uint ym,uint yl)
Something like this below?? Using the macs instruction in the end, handeling the sign of A and B.
Code: Select all
{ym,yl}=lmul(Al,Bh,0,Cl);
asm("ladd %0, %1, %2, %3, %4 " : "=r"(carry), "=r"(ym) : "r"(Cm), "r"(ym), "r"(carry));
{yh,ym}=macs(Ah,Bh,carry,ym);
yh+=Ch;
Code: Select all
#include <xs1.h>
#include <print.h>
long Ah=0x40000001,Bh=0x40000001,Ch=0x10000001,yh;
unsigned long yl,ym,Al=0,Cm=2,Cl=3;
int carry=0;
int main(){
{ym,yl}=lmul(Al,Bh,0,Cl);
asm("ladd %0, %1, %2, %3, %4 " : "=r"(carry), "=r"(ym) : "r"(Cm), "r"(ym), "r"(carry));
{yh,ym}=macs(Ah,Bh,carry,ym);
yh+=Ch;
printhexln(yh);
printhexln(ym);
printhexln(yl);
return 0;
}
20000001 (yh)
80000003 (ym)
3 (yl)
0xA * 0xB + 0xC= 10000000 80000001 00000000 + 10000001 00000002 00000003
Probably not the most confused programmer anymore on the XCORE forum.
-
- XCore Expert
- Posts: 956
- Joined: Fri Dec 11, 2009 3:53 am
- Location: Sweden, Eskilstuna
This must be better for "MAC96". Can I reduce it further skipping the if-else ? Can you do abs() more efficient ?
Code: Select all
unsigned int ym=Cm;
unsigned int yl=Cl;
int yh=Ch;
if(x>=0)
{ym,yl}=mac(Al,x,ym,yl);
else
{ym,yl}=mac(Al,-x,ym,yl);
{yh,ym}=macs(Ah,x,Ch,ym);
Probably not the most confused programmer anymore on the XCORE forum.