Several questions about XMOS

Non-technical related questions should go here.
User avatar
jonathan
Respected Member
Posts: 377
Joined: Thu Dec 10, 2009 6:07 pm

Post by jonathan »

I'm pretty sure I understand why this is happening (scheduling as davidnorman suggested), but I'd rather make sure before I post an explanation. If an XMOS-ite beats me to it, then no problem, but I'll try to get the details on here tonight (UK time).


Image
Heater
Respected Member
Posts: 296
Joined: Thu Dec 10, 2009 10:33 pm

Post by Heater »

jonathan and davidnorman,

This gets weirder. if I move all my threads to stdcore[1] I get completely different results:

Code: Select all

Threads    Clocks
8          2
7          2
6          2/1
5          2
4          1
3          1
2          1
1          1
Notice how it is now the even numbered 6 thread case that jitters instead of the odd numbered 5 and 7 threads as on stdcore[0].

Someone has a lot of explaining to do:)
Heater
Respected Member
Posts: 296
Joined: Thu Dec 10, 2009 10:33 pm

Post by Heater »

jonathan and davidnorman,

And weirder still:

Now I move all the threads to stcore[2]. The results change again:

Code: Select all

Threads    Clocks
8          2
7          1
6          2/1
5          1
4          1
3          1
2          1
1          1
Surprisingly we see the 7 thread case running in one clock tick!!

So not only is determinism not quite present within the threads of an xcore each xcore is different:)

Should I move on to stdcore[3] ?
Heater
Respected Member
Posts: 296
Joined: Thu Dec 10, 2009 10:33 pm

Post by Heater »

jonathan and davidnorman,

Well of course I do. With all threads on stdcore[3] the results change again:

Code: Select all

Threads    Clocks
8          2
7          2
6          2/1
5          1
4          1
3          1
2          1
1          1
Note how the 7 thread case is no longer a winner???

I think that's enough for today.
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

I see nothing strange here. Your timer is the refclk, not the core clock. In the default
configuration, the core clock runs four times as fast as the refclk.

If you have 4 (or fewer) threads running, you will see the refclk tick once for one core
insn on your thread; if you have 8 threads running, you will see it tick twice. In both
these cases, execution of your thread stays perfectly aligned with the refclk.

If you have 5 to 7 threads running, It depends on how your thread is aligned with
the refclk. Here's the deal (top is refclk; bottom is X where your thread is scheduled):

Code: Select all

4 threads, case 1:
0000111122223333444455556666777788889999aaaabbbb
X...X...X...X...X...X...X...X...X...X...X...X...
 "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"

4 threads, case 2:
0000111122223333444455556666777788889999aaaabbbb
.X...X...X...X...X...X...X...X...X...X...X...X..
  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"

4 threads, case 3:
0000111122223333444455556666777788889999aaaabbbb
..X...X...X...X...X...X...X...X...X...X...X...X.
   "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"

4 threads, case 4:
0000111122223333444455556666777788889999aaaabbbb
...X...X...X...X...X...X...X...X...X...X...X...X
    "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"

Code: Select all

5 threads:
0000111122223333444455556666777788889999aaaabbbb
X....X....X....X....X....X....X....X....X....X..
 "1"  "1"  "1"  "2"  "1"  "1"  "1"  "2"  "1"

Code: Select all

6 threads, case 1:
0000111122223333444455556666777788889999aaaabbbb
X.....X.....X.....X.....X.....X.....X.....X.....
 "1"   "2"   "1"   "2"   "1"   "2"   "1"

6 threads, case 2:
0000111122223333444455556666777788889999aaaabbbb
.X.....X.....X.....X.....X.....X.....X.....X....
  "1"   "2"   "1"   "2"   "1"   "2"   "1"

Code: Select all

7 threads:
0000111122223333444455556666777788889999aaaabbbb
X......X......X......X......X......X......X.....
 "1"    "2"    "2"    "2"    "1"    "2"

Code: Select all

8 threads (4 cases, you can figure it out):
0000111122223333444455556666777788889999aaaabbbb
X.......X.......X.......X.......X.......X.......
 "2"     "2"     "2"     "2"     "2"     "2"
Since your test code only runs a single timing every time, and does I/O between
those, it will likely start at the same spot every time (because it synched with a
port clock).
Heater
Respected Member
Posts: 296
Joined: Thu Dec 10, 2009 10:33 pm

Post by Heater »

segher,
I see nothing strange here.
You might be the only one:) Let's see if we can understand this.
Your timer is the refclk, not the core clock. In the default
configuration, the core clock runs four times as fast as the refclk.
It's not immediately obvious why that makes any difference. Would I not get results of more like 4 or 8 instead of 1 or 2 timer ticks if I used the four times faster core clock?

How do I change the program to use the core clock for timing?

Give that the timer I have there might be used to pace the timing of an output (timeafter) then I might expect that jitter to turn up the port as well.
If you have 4 (or fewer) threads running, you will see the refclk tick once for one core
insn on your thread;
OK, Yep.
if you have 8 threads running, you will see it tick twice. In both
these cases, execution of your thread stays perfectly aligned with the refclk.
Hmm.. OK.
If you have 5 to 7 threads running, It depends on how your thread is aligned with
the refclk.
Ah, so these clocks are not necessarily in a fixed phase relationship (yes/no?).

Now I notice, if I compile the test to assembler, that the timer start and end values are read with only two consecutive instructions:

Code: Select all

          in        r1, res[r5]       # Gets start time
          in        r0, res[r5]       # Gets end time
I think what I was looking for is to see that the two instructions are separated by

4 ticks for 1 to 4 threads
5 ticks for 5 threads
6 ticks for 6 threads
7 ticks for 7 threads
8 ticks for 8 threads.

So again how can I get at the core clock to show this?

Am I right in my statement above that using normal timers, as I do, will result in this jitter on my port outputs?
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

The easiest way to get timing measured in core clocks for your experiment,
is to run the refclk the same, say, both at 100MHz. I think you can set that
in your XN file?

Another thing you could do is not measure one instruction but a whole bunch,
e.g. 100, and see how many refclk ticks that takes.
User avatar
davelacey
Experienced Member
Posts: 104
Joined: Fri Dec 11, 2009 8:29 pm

Post by davelacey »

I guess I'm a bit late but segher is right about what you are seeing here. With 5 threads, the instructions for each thread will be 5 core cycles apart. The reference clock ticks every 4 core cycles. So between two instructions 5 apart there may be 1 or 2 ticks.

It is useful to know about the -t option to the simulator. This dumps an instruction trace out to standard output.
If I do this on your program you can see the 1 tick case:

Code: Select all

stdcore[0]@0- -A-a-p-p-p-.----000100c4 (timed_thread        + 18) : in      r1(0xb17), res[r5(0x1)] @11358
stdcore[0]@1- -p-A-a-p-p-.----.000100e0 (__main_xm_1         +  0) : bu      -0x1 @11359
stdcore[0]@2- -p-p-A-a-p-.----..000100dc (__main_xm_2         +  0) : bu      -0x1 @11360
stdcore[0]@3- -p-p-p-A-a-.----...000100d8 (__main_xm_3         +  0) : bu      -0x1 @11361
stdcore[0]@4- -a-p-p-p-A-.----....000100d4 (__main_xm_4         +  0) : bu      -0x1 @11362
stdcore[0]@0- -A-a-p-p-p-.----000100c6 (timed_thread        + 1a) : in      r0(0xb18), res[r5(0x1)] @11363
and the 2 tick case:

Code: Select all

stdcore[0]@0- -A-a-p-p-p-.----000100c4 (timed_thread        + 18) : in      r1(0x12b1), res[r5(0x1)] @19143
stdcore[0]@1- -p-A-a-p-p-.----.000100e0 (__main_xm_1         +  0) : bu      -0x1 @19144
stdcore[0]@2- -p-p-A-a-p-.----..000100dc (__main_xm_2         +  0) : bu      -0x1 @19145
stdcore[0]@3- -p-p-p-A-a-.----...000100d8 (__main_xm_3         +  0) : bu      -0x1 @19146
stdcore[0]@4- -a-p-p-p-A-.----....000100d4 (__main_xm_4         +  0) : bu      -0x1 @19147
stdcore[0]@0- -A-a-p-p-p-.----000100c6 (timed_thread        + 1a) : in      r0(0x12b3), res[r5(0x1)] @19148
In both cases the same number of core cycles occur, so it is deterministic. However, the difference is in where they start relative to the reference clock. In the first case we start at 11358 = 2 (mod 4) and in the second case at 19143 = 3 (mod 4).

Dave
Heater
Respected Member
Posts: 296
Joined: Thu Dec 10, 2009 10:33 pm

Post by Heater »

segher:
The easiest way to get timing measured in core clocks for your experiment, is to run the refclk the same, say, both at 100MHz. I think you can set that in your XN file?
OK anyone know how to do that?

Another thing you could do is not measure one instruction but a whole bunch, e.g. 100, and see how many refclk ticks that takes.
That code I posted is a cut down version of my integer FFT implementation which takes about 4ms to run as one thread out of 4. That is where I first noticed this phenomena.

davelacey,

I think I grasp, and accept, the core cycles vs reference clock cycles thing now.
Thank you for the simulator trace. That makes it very clear.

Now what about the issue of IO pin jitter?
If my port output is ultimately timed by the core clock, which it must be as that's what's driving instructions but my code is waiting for "timeafter" which is from the ref clock then there is scope for jitter or not?
User avatar
davelacey
Experienced Member
Posts: 104
Joined: Fri Dec 11, 2009 8:29 pm

Post by davelacey »

Heater wrote: Another thing you could

Now what about the issue of IO pin jitter?
If my port output is ultimately timed by the core clock, which it must be as that's what's driving instructions but my code is waiting for "timeafter" which is from the ref clock then there is scope for jitter or not?
Yes. If you have code that goes something like:

Code: Select all

    t when timerafter(next_port_time) :> void;
    p <: value;
Then there could be a variable amount of time between the reference clock tick at next_port_time and the port output, even if the port is clocked of the reference clock as well. This is the same case as in your example - there may be an extra tick between. If this variability is unacceptable (in some cases it will not matter) you can use timed outputs e.g.

Code: Select all

    p @ next_port_time <: value
This will reduce the jitter but note that this is timed of the 16-bit port counter and not the central reference clock any more. This port counter may be in sync with the reference clock if it is configured that way but is 16-bit as opposed to 32-bit and the starting point may be different.

Dave