## "High-End Audio" IIR filters

XCore Project reviews, ideas, videos and proposals.
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am
lilltroll wrote:The thing is that multiplication works exactly the same for
positive and negative numbers as well, as long as you sign extend the numbers.
Signed and unsigned multiply are different, and mixed multiply
is different still. XS1 does not have mixed multiply. And you need
it for multiprecision signed multiplies.

Sign extending in your case makes your numbers take three limbs
(words) instead of two (that is the "extending" part). Biasing the
numbers (to always be unsigned) does not, so you do not need
to calculate with that extension (multiply, mask, whatever). It
also gives you more freedom for the last steps (you can play many
more stupid little tricks with adds than you can with muladds).
The question is thus, when is it cheaper to add an offset and subtract it on the XMOS ISA, compared to just multiply direct?
On XS1, like on most machines without mixed multiply(-add), it is
almost always a win.
Assume that both a and b are a(n) and b(n), (time-dependent),
Do you mean you have no reuse of either at all? That is a different
problem :-)
is it better to use the bias method? Or is it only when one coefficient becomes static over a longer time-window the total instruction count is reduced.
Well let's count :-)

For doing a single multiply, we'll lose four instructions (two ashr,
two muladds with those); we need two insns to do the biasing, and
two more for the final correction (unless you can combine it with
a shift you have elsewhere, then only one -- but let's ignore that).
So it's a wash then.

Say one of the numbers is fixed (for some period); then the biasing
takes only one insn, and the correction still only two (if doing multiple
accumulations, all the fixed corrections can be combined -- they are
just additions after all!) So one or two insns saved.

If both numbers are fixed, nothing is saved since either sequence
takes zero cycles :-P

lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna
segher wrote:
lilltroll wrote:The thing is that multiplication works exactly the same for
positive and negative numbers as well, as long as you sign extend the numbers.
Signed and unsigned multiply are different, and mixed multiply
is different still. XS1 does not have mixed multiply. And you need
it for multiprecision signed multiplies.
In which way is it different ? The XMOS ISA has only one mul instruction (working on a 32-bit product) independent of signed or unsigned numbers. The compiler will not separate the cases.
As long as the result doesn't overflow, the result will be correct for both positive and negative numbers or a mixed case.
If it overflows - it doesn't matter.
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am
lilltroll wrote:
segher wrote:
lilltroll wrote:The thing is that multiplication works exactly the same for
positive and negative numbers as well, as long as you sign extend the numbers.
Signed and unsigned multiply are different, and mixed multiply
is different still. XS1 does not have mixed multiply. And you need
it for multiprecision signed multiplies.
In which way is it different ? The XMOS ISA has only one mul instruction (working on a 32-bit product) independent of signed or unsigned numbers. The compiler will not separate the cases.
As long as the result doesn't overflow, the result will be correct for both positive and negative numbers or a mixed case.
If it overflows - it doesn't matter.
The MUL instruction does not do a signed multiply, and neither does
it do an unsigned multiply: it does a multiply in the ring of integers
modulo 2**32. Yes, of course this is the low 32 bits of both the signed
and the unsigned result :-)

But we were (well, I was, anyway) talking about the widening multiplies,
32x32->64, i.e. MACCU, MACCS, and LMUL. LMUL never "overflows"
(it's the best instruction ever, a wonder of modern technology!); and for
the other two, if you make sure they do not overflow (like all the 64x64->96
sequences in this thread do, except for overflow "at the top"), you get
the unsigned resp. signed result. But we need a "mixed" result (for a2*b0
and a0*b2, in the a2:a1:a0 * b2:b1:b0, where a2 and b2 are sign extensions).
So some work is needed -- either actually compute those sign extensions,
and then yes any multiply insn will work because you only need the low 32
bits of the result; or do something else, like my proposed biasing to force
all numbers to be in a range that is easier to deal with.

Ah, so you were saying that non-widening multiplies are the same in
signed and in unsigned. Yes, that is true; but computing a widening
multiply by first widening the arguments and then doing the multiplication
is not the only game in town.
asibbald
Member
Posts: 11
Joined: Thu Jan 07, 2010 1:56 am
Guys,
This discussion is very interesting and a lot of good work is being done on high quality Biquads here.
However, there is an additional approach that is worth taking into account. The definitive work on this alternative/additional approach was done in 1982 by Jon Dattorro:

J. Dattorro, “The Implementation of Recursive Digital Filters for High Fidelity Audio,” J. Audio Eng. Soc., vol 36, pp 851-878 (1988 Nov)

A very useful summary of the situation was presented by Marc Allie at the comp.dsp conference in 2004. His presentation is available here:

http://www.danvillesignal.com/files/com ... Filter.ppt

In essence, Dattorro presents a computationally efficient means of "error correcting" the roundoffs and obtaining a SNR equivalent to that that would be obtained by a "Brute force" method using a far higher number of bits.

Whilst Dattorros paper - being 1982 - deals with 16-bit computation, the algorithm he presents is general and can in principle be applied to any number of bits.

Well worth reading. I'd suggest the Allie presentation first as a good overview.
asibbald
Member
Posts: 11
Joined: Thu Jan 07, 2010 1:56 am
Sorry - spot the deliberate mistake. Dattorro's paper was 1988 not 1982
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna
Hmm, it should be easy to compute the filter-response for the lo-part, and add it to the accumulator for the next sample.
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna
asibbald wrote:Guys,
This discussion is very interesting and a lot of good work is being done on high quality Biquads here.
However, there is an additional approach that is worth taking into account. The definitive work on this alternative/additional approach was done in 1982 by Jon Dattorro:

J. Dattorro, “The Implementation of Recursive Digital Filters for High Fidelity Audio,” J. Audio Eng. Soc., vol 36, pp 851-878 (1988 Nov)

A very useful summary of the situation was presented by Marc Allie at the comp.dsp conference in 2004. His presentation is available here:

http://www.danvillesignal.com/files/com ... Filter.ppt

In essence, Dattorro presents a computationally efficient means of "error correcting" the roundoffs and obtaining a SNR equivalent to that that would be obtained by a "Brute force" method using a far higher number of bits.

Whilst Dattorros paper - being 1982 - deals with 16-bit computation, the algorithm he presents is general and can in principle be applied to any number of bits.

Well worth reading. I'd suggest the Allie presentation first as a good overview.
The optimal noise shaping or 'Wilsons rule' is just a way to split the word into a hi-part and into a lo-part, and that is exactly the same thing using "brute force" computation - spit everything into 32-bit parts.

In the end, it it just a question of how to multiply a int32 * int64 or a int34 * int62 as fast as possible.

Or do you think the trivial noise-shaping would have a good time/performance gain ?
(I do not have acess to the old JAES paper), or do I miss something here !?
You do not have the required permissions to view the files attached to this post.
bearcat
Respected Member
Posts: 283
Joined: Fri Mar 19, 2010 4:49 am
Noise shaping is one of a few options for a BiQuad. I performed several trials with different actual implementations of the BiQuad on the XMOS platform to deterimine the optimal configuration for audio signals (in my opinion) for both quality and speed.

Using my results shown prior, the 32 bit filter has my optimal configuration and the performance is quite good. Noise is down below -150dB with 32 bit filters after 10's of IIR's.

But performance is even better with a higher resolution filter, as my SNR and THD results show earlier. If you are really interested in 32 bit audio resolution you will need more than 32 bit filters to achieve results approaching that resolution.
bearcat
Respected Member
Posts: 283
Joined: Fri Mar 19, 2010 4:49 am
Also wanted to mention that lilltrol has highlighted the worst case scenerio for this thread, a low pass filter down in subwoofer frequencies.

I have not actually tested these filters down there yet, so I do not have answers for this case. I am expecting this will require higher resolutions.
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna
Indeed, poles very close to the unit circle including the Point (1 +0j ) in the Z-plane, creates a large feedback, thus it will take a very long time for the signal to decline*, and an error will circulate for along time with many truncations.

*eq to that the impulse response of the filter is long in time