I'm having trouble understanding the DSP performance of the XCORE-200.
I have an XCORE that (I think ) is running at 500 MHz. It's based on the XCORE-200 multichannel reference design, so it's an XU216.
I have installed my DSP code into the ADC pathway using a new core, and I need to run DSP on 32 channels of audio, running at 48kHz. Each channel needs to be processed with a 10 TAP FIR filter. That's it.
That's 10 taps/channel * 32 channels/frame * 48000 frames/s = 15,360,000 taps/second. Should that be achievable?
I'm getting really slow performance: It's taking 23.5uS to process 1 frame, for an equivalent of 320taps/23.5uS = 13,617,000 taps/second.
Shouldn't the DSP performance of the XCORE-200 be much better than this? More like 65 MACS/second? It is 1 MAC/tap correct?
In order to be as fast and explicit as possible, I didn't use a loop, so here's the DSP code:
Code: Select all
#define N_COEFFS 10
#define QFMTN 28
#pragma unsafe arrays
static inline void processSamples(int samplesIn[], int samplesOut[])
{
samplesOut[0 ] = dsp_filters_fir(samplesIn[0 ], delay_coeffs_3 , filter_states_0 , N_COEFFS, QFMTN);
samplesOut[1 ] = dsp_filters_fir(samplesIn[1 ], delay_coeffs_3 , filter_states_1 , N_COEFFS, QFMTN);
samplesOut[2 ] = dsp_filters_fir(samplesIn[2 ], delay_coeffs_2 , filter_states_2 , N_COEFFS, QFMTN);
samplesOut[3 ] = dsp_filters_fir(samplesIn[3 ], delay_coeffs_2 , filter_states_3 , N_COEFFS, QFMTN);
samplesOut[4 ] = dsp_filters_fir(samplesIn[4 ], delay_coeffs_1 , filter_states_4 , N_COEFFS, QFMTN);
samplesOut[5 ] = dsp_filters_fir(samplesIn[5 ], delay_coeffs_1 , filter_states_5 , N_COEFFS, QFMTN);
samplesOut[6 ] = dsp_filters_fir(samplesIn[6 ], delay_coeffs_0 , filter_states_6 , N_COEFFS, QFMTN);
samplesOut[7 ] = dsp_filters_fir(samplesIn[7 ], delay_coeffs_0 , filter_states_7 , N_COEFFS, QFMTN);
samplesOut[8 ] = dsp_filters_fir(samplesIn[8 ], delay_coeffs_3 , filter_states_8 , N_COEFFS, QFMTN);
samplesOut[9 ] = dsp_filters_fir(samplesIn[9 ], delay_coeffs_3 , filter_states_9 , N_COEFFS, QFMTN);
samplesOut[10 ] = dsp_filters_fir(samplesIn[10 ], delay_coeffs_2 , filter_states_10, N_COEFFS, QFMTN);
samplesOut[11 ] = dsp_filters_fir(samplesIn[11 ], delay_coeffs_2 , filter_states_11, N_COEFFS, QFMTN);
samplesOut[12 ] = dsp_filters_fir(samplesIn[12 ], delay_coeffs_1 , filter_states_12, N_COEFFS, QFMTN);
samplesOut[13 ] = dsp_filters_fir(samplesIn[13 ], delay_coeffs_1 , filter_states_13, N_COEFFS, QFMTN);
samplesOut[14 ] = dsp_filters_fir(samplesIn[14 ], delay_coeffs_0 , filter_states_14, N_COEFFS, QFMTN);
samplesOut[15 ] = dsp_filters_fir(samplesIn[15 ], delay_coeffs_0 , filter_states_15, N_COEFFS, QFMTN);
samplesOut[16 ] = dsp_filters_fir(samplesIn[16 ], delay_coeffs_2 , filter_states_16, N_COEFFS, QFMTN);
samplesOut[17 ] = dsp_filters_fir(samplesIn[17 ], delay_coeffs_2 , filter_states_17, N_COEFFS, QFMTN);
samplesOut[18 ] = dsp_filters_fir(samplesIn[18 ], delay_coeffs_1 , filter_states_18, N_COEFFS, QFMTN);
samplesOut[19 ] = dsp_filters_fir(samplesIn[19 ], delay_coeffs_1 , filter_states_19, N_COEFFS, QFMTN);
samplesOut[20 ] = dsp_filters_fir(samplesIn[20 ], delay_coeffs_2 , filter_states_20, N_COEFFS, QFMTN);
samplesOut[21 ] = dsp_filters_fir(samplesIn[21 ], delay_coeffs_2 , filter_states_21, N_COEFFS, QFMTN);
samplesOut[22 ] = dsp_filters_fir(samplesIn[22 ], delay_coeffs_1 , filter_states_22, N_COEFFS, QFMTN);
samplesOut[23 ] = dsp_filters_fir(samplesIn[23 ], delay_coeffs_1 , filter_states_23, N_COEFFS, QFMTN);
samplesOut[24 ] = dsp_filters_fir(samplesIn[24 ], delay_coeffs_0 , filter_states_24, N_COEFFS, QFMTN);
samplesOut[25 ] = dsp_filters_fir(samplesIn[25 ], delay_coeffs_0 , filter_states_25, N_COEFFS, QFMTN);
samplesOut[26 ] = dsp_filters_fir(samplesIn[26 ], delay_coeffs_2 , filter_states_26, N_COEFFS, QFMTN);
samplesOut[27 ] = dsp_filters_fir(samplesIn[27 ], delay_coeffs_2 , filter_states_27, N_COEFFS, QFMTN);
samplesOut[28 ] = dsp_filters_fir(samplesIn[28 ], delay_coeffs_1 , filter_states_28, N_COEFFS, QFMTN);
samplesOut[29 ] = dsp_filters_fir(samplesIn[29 ], delay_coeffs_1 , filter_states_29, N_COEFFS, QFMTN);
samplesOut[30 ] = dsp_filters_fir(samplesIn[30 ], delay_coeffs_1 , filter_states_30, N_COEFFS, QFMTN);
samplesOut[31 ] = dsp_filters_fir(samplesIn[31 ], delay_coeffs_1 , filter_states_31, N_COEFFS, QFMTN);
samplesOut[32 ] = samplesIn[32];
samplesOut[33 ] = samplesIn[33];
}
What can I do to improve performance?
Thanks,
-Caleb