interp_x2 uses the VPU and is written in ASM. The code below is the infinite loop of the task I run from main. I was inspired by the repo https://github.com/xmos/xmath_walkthrough/ where the par statement is used in a similar fashion. The interp_x2 function is not that long (~25 instruction bundles), could it be that there is so much overhead in spawning the two threads that the benefit of going multicore cannot be appreciated?
Code: Select all
#define R 2
#define NCOEFF 48
while (1) {
rx_frame(datain, FRAME_SIZE, c_audio);
xscope_start(PROBE1);
for(int i=0; i<FRAME_SIZE; i++) {
pos = update_state(state, pos, NCOEFF, datain[i]);
unsafe {
par {
interp_x2(&pstate[pos1], &pcoeff[0], &pout[i*R]); // FIR phase 1
interp_x2(&pstate[pos1], &pcoeff[NCOEFF], &pout[i*R+1]); // FIR phase 2
}
}
}
xscope_stop(PROBE1);
tx_frame(c_audio, out, R*FRAME_SIZE);
}