xC: different timing on different call sequences (or whatever)

aclassifier · Post by **aclassifier** » Mon Jul 01, 2024 7:09 pm

Hi!

I thought I should test what is "the best" of these two, where to do the indexing of the array, on the outer level or the inner:

Code: Select all

typedef struct {
    unsigned array[TEST_ARRAY_IN_STRUCT_LEN]; 
} array_type_t;
//
typedef struct {
    array_type_t array_type;
    unsigned     array[TEST_ARRAY_IN_STRUCT_LEN];
} arrays_in_struct_t;

So I made some functions to test this. I was not too surprised to learn that handling of like a 1000 elements used the same amount of time. Maybe indexing is indexing, it doesn't matter where in the tree it's being done. Kind of.

Like when several test functions were called one way, like two times of each function. I have four test functions.

But then I learned that if I made the call sequence one time each function, the time usage seemed to increase for each call.

I don't know if this has to do with what I am testing, or the call sequence or some other strange side effect. The full code is attached.

xCore-200 board. xTIMEcomposer 14.4.1 xC code.

If I run this code together with the full code of a project proper, I get the same kind of strange timing. But they don't need to have the same strange timing for the same type of run.

Anybody recognise this?

McCrea · Post by **McCrea** » Mon Jul 08, 2024 5:43 pm

Hi Øyvind,
It's hard to say without seeing the compiler invocation, but my suspicion is that this is a result of inlining by the compiler.
It's likely that test_arrays_in_struct_outer and test_arrays_in_struct_inner will be inlined into test_arrays_in_struct_io and similar.
Once the function inlining has taken place, the compiler can make all sorts of transformations - so you might get slightly different instructions emitted depending on where each function gets inlined. Even if the instruction sequences are the same, the instructions might be aligned differently - this can potentially lead to slight differences in performance due to fetch no-ops (cycles where the core can't make progress because it needs to fetch the next series of instructions to execute).
When inlining takes place, executing the inlined sequence is usually faster than making an equivalent function call (it is, after all, an optimisation).
You can prevent the compiler inlining a function into its caller by defining the function and caller in separate source files.

aclassifier · Post by **aclassifier** » Tue Jul 09, 2024 9:25 am

Thank you! Welcome to the forum!

I will try to put the functions in mind in a different file and test.

If this is interesting to others I could zip the whole project with binaries and all here.

I assume that this would mean that xta would know this and if the limits were set by me in between the shorter and the longer time, then one build might fail and the other not?

xC: different timing on different call sequences (or whatever)

xC: different timing on different call sequences (or whatever)

Re: xC: different timing on different call sequences (or whatever)

Re: xC: different timing on different call sequences (or whatever)