Special floating points formats ???

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

Well the confused programmer is not used to the C notation of funny signs like ^ & && and |
Checking the programmers guide I realize some funny error of my masking tech.
The reason why it worked with multiplication was that the normalizations step never happend.

I have to use a sign bit instead

Code: Select all

  unsigned int A=1111;
  unsigned int B=1111;
  unsigned int A_e;
  unsigned int B_e;
  unsigned int P_r,P_e;
  
  A_e=clz(A);
  B_e=clz(B);
  {P_r,void}=mac(A<<A_e,B<<B_e,0,0);
  P_e=A_e+B_e-32;
  
  printf("%u*%u =? %u*2^-%u=%u\n",A,B,P_r,P_e,P_r>>P_e);
Console:
1111*1111 =? 1263944704*2^-10=1234321


Probably not the most confused programmer anymore on the XCORE forum.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

I do not handle the sign yet, but I hope this one works better.
Add could be done more compact if it didn't overflow without one extra >>1 :idea:

Code: Select all

int main() {
  //Calculates A*B+C;
  unsigned int A=1111;
  unsigned int B=1111;
  unsigned int C=123456789;
  
  unsigned int A_e=128,B_e=128,C_e=128;
  unsigned int A_r,B_r,C_r;
  unsigned int P_r,P_e;
  unsigned int S_r,S_e;
  char temp;
  printf("%u*%u+%u =?\n",A,B,C);
  
  A_e-=clz(A);A_r=A<<(128-A_e);
  B_e-=clz(B);B_r=B<<(128-B_e);
  C_e-=clz(C);C_r=C<<(128-C_e);
  printf("%u*2^%d * %u*2^%d + %u*2^%d =?\n",A_r,A_e-128,B_r,B_e-128,C_r,C_e-128);
  //P=A*B;
  {P_r,void}=mac(A_r,B_r,0,0);
  temp=clz(P_r);
  P_r<<=temp;
  P_e=A_e+B_e-temp-96;
  printf("%u*2^%d + %u*2^%d =?\n",P_r,P_e-128,C_r,C_e-128);
  //S=P+C=A*B+C
  if(P_e>C_e)
  {
	  S_r=(P_r>>1) + (C_r>>(P_e-C_e+1));//reduce  
	  S_e=P_e-1;
	  printf("Product has the largest exponent\n");
  }
  else if(P_e<C_e)
  {
	  S_r=(C_r>>1)+(P_r>>(C_e-P_e+1));
	  S_e=C_e-1;
	  printf("Sum has the largest exponent\n");
  }
  else
  {
	  S_r=(C_r>>1)+(P_r>>1);
	  S_e=C_e+1; 
	  printf("Both has the same exponent\n");
  }
	  
  printf("%u*2^%d =?\n",S_r,S_e-128); 	
  temp=clz(S_r);
  S_r<<=temp;
  S_e+=temp;  	  

 printf("%u*2^%d = %u =? **%u**",S_r,S_e-128,S_r>>(128-S_e),A*B+C);
  return 0;   
}
Console:

Code: Select all

1111*1111+123456789 =?
2329935872*2^-21 * 2329935872*2^-21 + 3950617248*2^-5 =?
2527889408*2^-11 + 3950617248*2^-5 =?
Sum has the largest exponent
1995057760*2^-6 =?
3990115520*2^-5 = 124691110 =? **124691110**
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

Uses 32 bit fraction lengt + a sign bit. Based on MAC and LADD it cannot overflow with the given fraction length.
It seems to work, but the ASM code is a bit heavy for the moment ckecking in XTA.

#include <xs1.h>
#include <xclib.h>
#include "filtertoolbox.h"

int main()
{
int B=1111,Z0=1234321;
int y,x=-1111;
struct single xf,Bf,Pf,Z0f;
xf=int2float(x);
Bf=int2float(B);
Z0f=int2float(Z0);
Pf=f_mul(Bf,xf);
Pf=f_add(Pf,Z0f);
y=float2int(Pf);

//printf("(-1)^%d*%u*2^%d =? %d, svar=%d",Pf.sign,Pf.fraction,Pf.exponent,y,B*x+Z0);
return(0);
}
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

Will take a look at this, after searching Analogs homepage
http://www.analog.com/static/imported-f ... .08.07.pdf
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

I think I have something working now that never has a mantissa fraction-length shorter than 61-bits.
By reducing it to min 61 bit - I never get overflow, carryouts or sign problems *1 - which fastens up the code regarding determenistic processing.

Time to try it out with realtime signals instead of different numbertests with printf();



*1 I Hope
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

Well it's working now with a 64 bit mantissa, and I am trying to optimze the code in XTA to make it run as fast as possible.

It uses 143 thread cycles for the moment including the streamed channel communication, but it seems to still have room for improvments.
I do not know much of the pipeline, but I guess it's also possible to reduce the amounts of FNOP, by moving some instructions in the program.
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

Ohh no, it creates more distortion for some filters compared to a filter with a fixed64 :? :cry: .

A floatingpoint representation with 64bit mantissa can represent all number that fixed64 can represent exactly.
I guess I have some type of roundoff bug. I'm writing a little connection bewteen XDE and MATLAB. That way I can run MATLAB signals throught the XMOS chip and back to MATLAB to check the result. The debugger is great, but I do not want to ckeck roundoff errors by hand :mrgreen:
Last edited by lilltroll on Wed Feb 10, 2010 8:50 pm, edited 1 time in total.
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

The MATLAB connection is running. Maybe not the fastest connection in the world, but it is working. I will maybe try with an virtual COM port as well for a link between MATLAB and XMOS.

...
x[26000]=3222632 y=-1492608 hmax=1613721 11 leading zeros
x[27000]=-5081837 y=-1632072 hmax=1798833 11 leading zeros
x[28000]=-8302174 y=-241970 hmax=1798833 11 leading zeros
x[29000]=-2832817 y=1469404 hmax=1798833 11 leading zeros
x[30000]=5763678 y=1623172 hmax=1827548 11 leading zeros
x[31000]=7993240 y=-202036 hmax=1827548 11 leading zeros
...


The result is stored to a file on the host. When the whole infile(input signal) is processed, it can be analysed in MATLAB with help of the Variable precision arithmetic tools.
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

First out, one of the fastest implementation: Fixed with a int32 filter-coeff, int32 signal and int64 accumulator.
The inputsignal is scaled from fixed 24 bits (Simulating a 24 Audio CODEC) to 30 bits which is the maximum possible amplification without creating overflow in the accumulator. (Overflow checking is done after each MAC)

The calculation is done on the real XS1-L1A chip, so no simulations of the chip is used to create the result.

The input signal is a windowed log chirp starting at 5 Hz ending at 24 kHz after 10 s @ 48kHz.

The filter is a second order (1 Hp+1 Lp) BP-filter [20 40]Hz @ 48 kHz. You could think of it as the lowest band of a 10 band octave equaliser. (It's common to use up to 3 bands / octave in Audio)

Finally MATLAB compares the result with computing the filter-respone with double float and calculates the error beween the double and the fixed implementation.
fixed_32-32-64.png
Below, the PSD of the calculated error.
PSD_fixed_32-32-64.png
To relate this to some dynamic, we need to compare it to the PSD of the inputsignal to the system.
Here the quotient of the cross power spectral density (Pyx) of the chirp x and the filter output y and the power spectral density (Pxx) of x is calculated. Txy=(Pxy)/(Pxx)
Txy_fixed_32-32-64.png
The dynamics in dB of the filter is worse than a 3$ 24-bit CODEC => No "HiFi" filter!


General XC implemetation of this filter form. (Not the fastest implementation)
x is the inputsignal to the filter and y is the output. B and A holds the filter coeff.

Code: Select all

	X[0]=x<<6;
	h=0;l=0;
	for (j=0; j<3; j++){ 
	{h, l} = macs(B[j], X[j], h, l);
	{h, l} = macs(A[0], Y[0]<<1, h, l);
	{h, l} = macs(A[1], Y[1], h, l);
	y=h<<1;
	X[2]=X[1];
	X[1]=X[0];
	Y[1]=Y[0];
	Y[0]=y;
	
	return (y>>6);
You do not have the required permissions to view the files attached to this post.
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

Lets run it with XMOS-doubles over the night!

Code: Select all

#include <xs1.h>
#include <math.h>
#include <stdio.h>
int m=0;
double B[]={1.30728645169890e-003,0,-1.30728645169890e-003};
double A[]={-1.99737173724053e+000,9.97385427096602e-001};
double Z[]={0,0};

int DigSig2(int x_int){

double x,y;

x=(double)x_int;
y= x*B[0] + Z[0];
Z[0]=x*B[1] - y*A[0] + Z[1];
Z[1]=x*B[2] - y*A[1];
if((m%500)==0)
	printf("x[%d]=%f\t\ty=%f\n",m,x,y);
m++;
return ((int) round(y));
}
Probably not the most confused programmer anymore on the XCORE forum.