XCore Exchange

Posted: **Wed May 09, 2018 1:38 pm**

The problem with using Q31 is that you could get overflow when you multiply accumulate into a 64 bit register. Here is some explanatory text from ARM for their similar function (arm_fir_q31) that uses 32 bit input data and a 64 bit accumulator, with a 32 bit output:

Scaling and Overflow Behavior:

The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around rather than clip. In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits. After all multiply-accumulates are performed, the 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.

So it appears you must scale down your input by log2(num_taps) to ensure you don't get overflow. So if you had 128 taps you would need to shift all your data right by log2(128) = 7 to ensure you don't overflow. Hence that's why I said @CousinItt's suggestion of using Q7.24 to avoid overflow is so useful.

Posted: **Mon May 28, 2018 8:29 am**

akp wrote:The problem with using Q31 is that you could get overflow when you multiply accumulate into a 64 bit register. Here is some explanatory text from ARM for their similar function (arm_fir_q31) that uses 32 bit input data and a 64 bit accumulator, with a 32 bit output:

Scaling and Overflow Behavior:

The function is implemented using an internal 64-bit accumulator. The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit. Thus, if the accumulator result overflows it wraps around rather than clip. In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits. After all multiply-accumulates are performed, the 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.
So it appears you must scale down your input by log2(num_taps) to ensure you don't get overflow. So if you had 128 taps you would need to shift all your data right by log2(128) = 7 to ensure you don't overflow. Hence that's why I said @CousinItt's suggestion of using Q7.24 to avoid overflow is so useful.

Thank you! But why log2(num_taps) ? I can't figure out.

Posted: **Wed Aug 01, 2018 10:49 am**

Sorry, been absent for a while. If it's not too late, yes you're correct. The more taps you have, the larger the accumulated result could be. Hence it makes sense to use a smaller fixed point format when you have more taps to avoid the risk of overflow.

Posted: **Wed Aug 01, 2018 2:26 pm**

I think cjf1699's problem may be that he/she doesn't really understand fixed point numbers, otherwise he/she could figure out why overflow is possible.

It's pretty obvious intuitively that if you have 2, n bit numbers multiplied together you can get a 2n bit result. So then if you multiply that by an n bit number you can get a 3n bit result (or thereabouts). So that means you can get num_taps*n bits in your result. Prescaling down by log2(num_taps) is (essentially) the same as dividing by 2^(log2(num_taps)) = num_taps. So the final result will have num_taps*(n bits/num_taps) = n bits I guess (or close enough for intuition anyway).

XCore Exchange

Problem when realizing DSP algorithm:LMS on xCORE-200 MC Audio

Re: Problem when realizing DSP algorithm:LMS on xCORE-200 MC Audio

Re: Problem when realizing DSP algorithm:LMS on xCORE-200 MC Audio

Re: Problem when realizing DSP algorithm:LMS on xCORE-200 MC Audio

Re: Problem when realizing DSP algorithm:LMS on xCORE-200 MC Audio