Page 1 of 1

Optimization by xC on for loops

Posted: Fri May 26, 2017 10:23 am
by tonetechnician
Hey everyone,

Have been running into an interesting issue regarding the xC compiler optimization. I have been implementing a matrix mixer on an xcore-200 processor. This matrix mixer uses a for loop to handle the mixing of the signals. I've found that my for loop behaviour is not as expected. If I do a plain for loop with the "#pragma loop unroll", it doesn't seem to iterate through the loop. This seems to be fixed by adding a counter and printintln at the end of the loop - this printintln seems to force the compiler to have to iterate through the loop as expected at runtime.

This works --->

Code: Select all

char n = 0; // counter for number of channels active
/* Do Mixing */
#pragma loop unroll
for (int i = 0; i < MAX_SPEAKERS; i++){
    for (int j = 0; j < MAX_CHANNELS; j++){
        n++
        /* mix samples */
            samples_mixed[i][j] = sample_out_buf[j]*fifoMixVals[i][j];
            samples_mixed[i][j] = samples_mixed[i][j] >> 8;     // Shift right to prevent overflow
            samples_out[i] += samples_mixed[i][j];
    }
    printintln(n);
}
printintln(n); // Force compiler to go through each iteration
This doesn't ---->

Code: Select all

/* Do Mixing */
#pragma loop unroll
for (int i = 0; i < MAX_SPEAKERS; i++){
    for (int j = 0; j < MAX_CHANNELS; j++){
        /* mix samples */
            samples_mixed[i][j] = sample_out_buf[j]*fifoMixVals[i][j];
            samples_mixed[i][j] = samples_mixed[i][j] >> 8;     // Shift right to prevent overflow
            samples_out[i] += samples_mixed[i][j];
    }
}
Now, when trying a similar thing on another for loop i my code I don't get the expected behaviour at all, even after adding the printintln fix.

Code: Select all

#pragma loop unroll
char count = 0;
for (int j = 0; j < 4; j++){
      count++;
      ambi_dec_mix[0][j] = sample_out_buf[j]*decoderVals[0][j];
      ambi_dec_mix[1][j] = sample_out_buf[j]*decoderVals[1][j];
      samples_out[0] += (ambi_dec_mix[0][j]);
      samples_out[1] += (ambi_dec_mix[1][j]);
     }
printintln(count);
so I have to resort to this which is definitely not a flexible solution

Code: Select all

samples_out[0] = (sample_out_buf[0]*decoderVals[0][0]) +
                 (sample_out_buf[1]*decoderVals[0][1]) +
                 (sample_out_buf[2]*decoderVals[0][2]) +
                 (sample_out_buf[3]*decoderVals[0][3]);

samples_out[1] = (sample_out_buf[0]*decoderVals[1][0]) +
                 (sample_out_buf[1]*decoderVals[1][1]) +
                 (sample_out_buf[2]*decoderVals[1][2]) +
                 (sample_out_buf[3]*decoderVals[1][3]);

Does anyone have any idea why this keeps happening, or how I could make it handle a for loop as expected?

Re: Optimization by xC on for loops

Posted: Fri May 26, 2017 11:44 am
by mon2
Very difficult to read the posted code.

https://www.xmos.com/support/tools/docu ... nent=14787

Try compiling with:

1) change the compiler optimization flags -> compile and review again

2) here is how you can alter the optimization flags

Image

Post your results on this testing for future readers. Try flags like -O0 and also -O2 and/or -O3

Re: Optimization by xC on for loops

Posted: Fri May 26, 2017 12:15 pm
by tonetechnician
Thanks so much for reply.

I've tried to increase the size of these images but not sure why they are so small. Will try fix again.

I have tried the different flags on the compiler. Have used particularly -O0 and it doesn't seem to help the problem :/

My flags are

XCC_FLAGS = -O0 -save-temps -g -report -fxscope -DRGMII=1 -lquadflash

I see there is also an XCC_FLAGS_main.xc which is set as:

XCC_FLAGS_main.xc = $(XCC_FLAGS) -falways-inline -O3

I have changed the XCC_FLAGS_main.xc to

XCC_FLAGS_main.xc = $(XCC_FLAGS) -falways-inline -O0

I'm recompiling now, and will let you know.

Re: Optimization by xC on for loops

Posted: Fri May 26, 2017 12:42 pm
by tonetechnician
Alright!

It seems the XCC_FLAGS_main.xc -03 was causing my issue.

Code: Select all

XCC_FLAGS = -O0 -save-temps -g -report -fxscope -DRGMII=1 -lquadflash
XCC_FLAGS_main.xc = $(XCC_FLAGS) -falways-inline -O0 
seems to make the for loop work. Unfortunately, this makes the code larger, but this is fine for my application as it stands.

Hope this helps any future readers

Re: Optimization by xC on for loops

Posted: Fri May 26, 2017 12:44 pm
by mon2
Excellent ! Perhaps someone else will have better advice but for now, it moves you forward :)