Then you probably have problems with

*limit cycles*, (even if you didn't know it)

http://en.wikipedia.org/wiki/Limit_cycle

Nonlinear !?, Well the rounding done for each sample introduces a non-linearity which is feedback into future samples, creating chaotic behaviour at least for a set of different filter coeffs. That makes the filter "sing"

*self sustained oscillations*with a limited amplitude in the backgorund.

Typically bass management @ fs >=96 kHz creates a lot of chaotic energy decreasing the DNR Dynamic Noise Ratio of the filter.

The way to solve this is to use higher precision (high enough to make the error less than the quantization noise) in the calculation.

For an example using double float on a x86, uses 53 bits of digits for the internal states of the filter, and the multiply and add will be calculated with 80 bits (total) before rounding (http://en.wikipedia.org/wiki/X87) which is most often "good enough"

This filter-implementation uses 96 bits in the accumulator and 64 bits for the internal states.

What I have tested so far for realistic audio filters,

**the error out from the filter has equal magnitude as the quantization noise**.

Compared to a software implementation with double, this implementation runs much faster.

You can

**run samples > 4.6 MHz/filter_order using one thread on a C5 device**.

Check out:

https://www.xcore.com/projects/high-end ... ir-filters

PS. Using floating point doesn't make the problem with limit cycles to disappear, you must have enough bits in the significand (also coefficient or mantissa) DS.