Multicore interpolation slower that single-core

satov · Post by **satov** » Tue Sep 09, 2025 8:04 pm

Hello! I made a simple interpolator x2 using a polyphase FIR. I want to see the advantage of going multicore by calculating the two phases of the FIR in parallel. To my surprise, calling the function interp_x2 sequentially on the same core takes much less than using the par statement.

interp_x2 uses the VPU and is written in ASM. The code below is the infinite loop of the task I run from main. I was inspired by the repo https://github.com/xmos/xmath_walkthrough/ where the par statement is used in a similar fashion. The interp_x2 function is not that long (~25 instruction bundles), could it be that there is so much overhead in spawning the two threads that the benefit of going multicore cannot be appreciated?

Code: Select all

#define R 2
#define NCOEFF 48
while (1) {
    rx_frame(datain, FRAME_SIZE, c_audio);

    xscope_start(PROBE1);
    for(int i=0; i<FRAME_SIZE; i++) {
        pos = update_state(state, pos, NCOEFF, datain[i]);
        unsafe {
            par {
                interp_x2(&pstate[pos1], &pcoeff[0],      &pout[i*R]);		// FIR phase 1
                interp_x2(&pstate[pos1], &pcoeff[NCOEFF], &pout[i*R+1]);	// FIR phase 2
            }
        }
    }
    xscope_stop(PROBE1);

    tx_frame(c_audio, out, R*FRAME_SIZE);
}