I'm having some interesting timing results using a streaming channel between 2 cores on different tiles through 2 xSwitches (XUF224 tiles 0 & 3).

I know going through multiple xSwitches increases the channel buffering and latency.

I can't understand why the sending thread would have massive pauses if the receiving thread is draining the channel as fast as possible see results below.

Also why does the first iteration round the sending loop some times take much longer?

I did add synchronization between the threads to make sure they started at the same time but this made no difference on the results.

I use the following code to generate the results with -o3 optimization level:

Code: Select all

`#include <platform.h>`

#include <print.h>

#define CONSECUTIVE_INTS 24

int main()

{

streaming chan c;

par

{

on tile[0]:

par

{

// Send task

{

timer t;

unsigned start_time, end_time;

while(1)

{

t :> start_time;

#pragma loop unroll

for (int i = 0; i < CONSECUTIVE_INTS; ++i)

{

c <: i;

}

t :> end_time;

printuintln(end_time - start_time);

}

}

}

on tile[3]:

par

{

// receive task

{

unsigned temp;

while(1)

{

#pragma loop unroll

for (int i = 0; i < CONSECUTIVE_INTS; ++i)

{

c :> temp;

}

}

}

}

}

return 0;

}

Results below show the printuintln from above which should indicate the loop duration in instructions.

In some cases 2 values are given in the subsequent iterations column due to some inconsistency?

CONSECUTIVE_INTS | First iteration | Subsequent iterations |
---|---|---|

6 | 6 | 6 |

7 | 7 | 7 |

8 | 8 | 8 |

9 | 17 | 9 |

10 | 26 | 13 |

11 | 36 | 23 or 24 |

12 | 46 | 33 |

13 | 55 | 42 or 43 |

14 | 65 | 52 or 53 |

15 | 74 | 61 |

16 | 84 | 71 or 72 |

17 | 94 | 81 |

18 | 103 | 90 or 91 |

19 | 113 | 100 or 101 |

20 | 122 | 109 or 110 |

21 | 132 | 118 or 120 |

22 | 142 | 129 |

Thanks for any help,

Mike.