Several questions about XMOS
-
- Respected Member
- Posts: 377
- Joined: Thu Dec 10, 2009 6:07 pm
I'm pretty sure I understand why this is happening (scheduling as davidnorman suggested), but I'd rather make sure before I post an explanation. If an XMOS-ite beats me to it, then no problem, but I'll try to get the details on here tonight (UK time).
-
- Respected Member
- Posts: 296
- Joined: Thu Dec 10, 2009 10:33 pm
jonathan and davidnorman,
This gets weirder. if I move all my threads to stdcore[1] I get completely different results:
Notice how it is now the even numbered 6 thread case that jitters instead of the odd numbered 5 and 7 threads as on stdcore[0].
Someone has a lot of explaining to do:)
This gets weirder. if I move all my threads to stdcore[1] I get completely different results:
Code: Select all
Threads Clocks
8 2
7 2
6 2/1
5 2
4 1
3 1
2 1
1 1
Someone has a lot of explaining to do:)
-
- Respected Member
- Posts: 296
- Joined: Thu Dec 10, 2009 10:33 pm
jonathan and davidnorman,
And weirder still:
Now I move all the threads to stcore[2]. The results change again:
Surprisingly we see the 7 thread case running in one clock tick!!
So not only is determinism not quite present within the threads of an xcore each xcore is different:)
Should I move on to stdcore[3] ?
And weirder still:
Now I move all the threads to stcore[2]. The results change again:
Code: Select all
Threads Clocks
8 2
7 1
6 2/1
5 1
4 1
3 1
2 1
1 1
So not only is determinism not quite present within the threads of an xcore each xcore is different:)
Should I move on to stdcore[3] ?
-
- Respected Member
- Posts: 296
- Joined: Thu Dec 10, 2009 10:33 pm
jonathan and davidnorman,
Well of course I do. With all threads on stdcore[3] the results change again:
Note how the 7 thread case is no longer a winner???
I think that's enough for today.
Well of course I do. With all threads on stdcore[3] the results change again:
Code: Select all
Threads Clocks
8 2
7 2
6 2/1
5 1
4 1
3 1
2 1
1 1
I think that's enough for today.
-
- XCore Expert
- Posts: 844
- Joined: Sun Jul 11, 2010 1:31 am
I see nothing strange here. Your timer is the refclk, not the core clock. In the default
configuration, the core clock runs four times as fast as the refclk.
If you have 4 (or fewer) threads running, you will see the refclk tick once for one core
insn on your thread; if you have 8 threads running, you will see it tick twice. In both
these cases, execution of your thread stays perfectly aligned with the refclk.
If you have 5 to 7 threads running, It depends on how your thread is aligned with
the refclk. Here's the deal (top is refclk; bottom is X where your thread is scheduled):
Since your test code only runs a single timing every time, and does I/O between
those, it will likely start at the same spot every time (because it synched with a
port clock).
configuration, the core clock runs four times as fast as the refclk.
If you have 4 (or fewer) threads running, you will see the refclk tick once for one core
insn on your thread; if you have 8 threads running, you will see it tick twice. In both
these cases, execution of your thread stays perfectly aligned with the refclk.
If you have 5 to 7 threads running, It depends on how your thread is aligned with
the refclk. Here's the deal (top is refclk; bottom is X where your thread is scheduled):
Code: Select all
4 threads, case 1:
0000111122223333444455556666777788889999aaaabbbb
X...X...X...X...X...X...X...X...X...X...X...X...
"1" "1" "1" "1" "1" "1" "1" "1" "1"
4 threads, case 2:
0000111122223333444455556666777788889999aaaabbbb
.X...X...X...X...X...X...X...X...X...X...X...X..
"1" "1" "1" "1" "1" "1" "1" "1" "1"
4 threads, case 3:
0000111122223333444455556666777788889999aaaabbbb
..X...X...X...X...X...X...X...X...X...X...X...X.
"1" "1" "1" "1" "1" "1" "1" "1" "1"
4 threads, case 4:
0000111122223333444455556666777788889999aaaabbbb
...X...X...X...X...X...X...X...X...X...X...X...X
"1" "1" "1" "1" "1" "1" "1" "1" "1"
Code: Select all
5 threads:
0000111122223333444455556666777788889999aaaabbbb
X....X....X....X....X....X....X....X....X....X..
"1" "1" "1" "2" "1" "1" "1" "2" "1"
Code: Select all
6 threads, case 1:
0000111122223333444455556666777788889999aaaabbbb
X.....X.....X.....X.....X.....X.....X.....X.....
"1" "2" "1" "2" "1" "2" "1"
6 threads, case 2:
0000111122223333444455556666777788889999aaaabbbb
.X.....X.....X.....X.....X.....X.....X.....X....
"1" "2" "1" "2" "1" "2" "1"
Code: Select all
7 threads:
0000111122223333444455556666777788889999aaaabbbb
X......X......X......X......X......X......X.....
"1" "2" "2" "2" "1" "2"
Code: Select all
8 threads (4 cases, you can figure it out):
0000111122223333444455556666777788889999aaaabbbb
X.......X.......X.......X.......X.......X.......
"2" "2" "2" "2" "2" "2"
those, it will likely start at the same spot every time (because it synched with a
port clock).
-
- Respected Member
- Posts: 296
- Joined: Thu Dec 10, 2009 10:33 pm
segher,
How do I change the program to use the core clock for timing?
Give that the timer I have there might be used to pace the timing of an output (timeafter) then I might expect that jitter to turn up the port as well.
Now I notice, if I compile the test to assembler, that the timer start and end values are read with only two consecutive instructions:
I think what I was looking for is to see that the two instructions are separated by
4 ticks for 1 to 4 threads
5 ticks for 5 threads
6 ticks for 6 threads
7 ticks for 7 threads
8 ticks for 8 threads.
So again how can I get at the core clock to show this?
Am I right in my statement above that using normal timers, as I do, will result in this jitter on my port outputs?
You might be the only one:) Let's see if we can understand this.I see nothing strange here.
It's not immediately obvious why that makes any difference. Would I not get results of more like 4 or 8 instead of 1 or 2 timer ticks if I used the four times faster core clock?Your timer is the refclk, not the core clock. In the default
configuration, the core clock runs four times as fast as the refclk.
How do I change the program to use the core clock for timing?
Give that the timer I have there might be used to pace the timing of an output (timeafter) then I might expect that jitter to turn up the port as well.
OK, Yep.If you have 4 (or fewer) threads running, you will see the refclk tick once for one core
insn on your thread;
Hmm.. OK.if you have 8 threads running, you will see it tick twice. In both
these cases, execution of your thread stays perfectly aligned with the refclk.
Ah, so these clocks are not necessarily in a fixed phase relationship (yes/no?).If you have 5 to 7 threads running, It depends on how your thread is aligned with
the refclk.
Now I notice, if I compile the test to assembler, that the timer start and end values are read with only two consecutive instructions:
Code: Select all
in r1, res[r5] # Gets start time
in r0, res[r5] # Gets end time
4 ticks for 1 to 4 threads
5 ticks for 5 threads
6 ticks for 6 threads
7 ticks for 7 threads
8 ticks for 8 threads.
So again how can I get at the core clock to show this?
Am I right in my statement above that using normal timers, as I do, will result in this jitter on my port outputs?
-
- XCore Expert
- Posts: 844
- Joined: Sun Jul 11, 2010 1:31 am
The easiest way to get timing measured in core clocks for your experiment,
is to run the refclk the same, say, both at 100MHz. I think you can set that
in your XN file?
Another thing you could do is not measure one instruction but a whole bunch,
e.g. 100, and see how many refclk ticks that takes.
is to run the refclk the same, say, both at 100MHz. I think you can set that
in your XN file?
Another thing you could do is not measure one instruction but a whole bunch,
e.g. 100, and see how many refclk ticks that takes.
-
- Experienced Member
- Posts: 104
- Joined: Fri Dec 11, 2009 8:29 pm
I guess I'm a bit late but segher is right about what you are seeing here. With 5 threads, the instructions for each thread will be 5 core cycles apart. The reference clock ticks every 4 core cycles. So between two instructions 5 apart there may be 1 or 2 ticks.
It is useful to know about the -t option to the simulator. This dumps an instruction trace out to standard output.
If I do this on your program you can see the 1 tick case:
and the 2 tick case:
In both cases the same number of core cycles occur, so it is deterministic. However, the difference is in where they start relative to the reference clock. In the first case we start at 11358 = 2 (mod 4) and in the second case at 19143 = 3 (mod 4).
Dave
It is useful to know about the -t option to the simulator. This dumps an instruction trace out to standard output.
If I do this on your program you can see the 1 tick case:
Code: Select all
stdcore[0]@0- -A-a-p-p-p-.----000100c4 (timed_thread + 18) : in r1(0xb17), res[r5(0x1)] @11358
stdcore[0]@1- -p-A-a-p-p-.----.000100e0 (__main_xm_1 + 0) : bu -0x1 @11359
stdcore[0]@2- -p-p-A-a-p-.----..000100dc (__main_xm_2 + 0) : bu -0x1 @11360
stdcore[0]@3- -p-p-p-A-a-.----...000100d8 (__main_xm_3 + 0) : bu -0x1 @11361
stdcore[0]@4- -a-p-p-p-A-.----....000100d4 (__main_xm_4 + 0) : bu -0x1 @11362
stdcore[0]@0- -A-a-p-p-p-.----000100c6 (timed_thread + 1a) : in r0(0xb18), res[r5(0x1)] @11363
Code: Select all
stdcore[0]@0- -A-a-p-p-p-.----000100c4 (timed_thread + 18) : in r1(0x12b1), res[r5(0x1)] @19143
stdcore[0]@1- -p-A-a-p-p-.----.000100e0 (__main_xm_1 + 0) : bu -0x1 @19144
stdcore[0]@2- -p-p-A-a-p-.----..000100dc (__main_xm_2 + 0) : bu -0x1 @19145
stdcore[0]@3- -p-p-p-A-a-.----...000100d8 (__main_xm_3 + 0) : bu -0x1 @19146
stdcore[0]@4- -a-p-p-p-A-.----....000100d4 (__main_xm_4 + 0) : bu -0x1 @19147
stdcore[0]@0- -A-a-p-p-p-.----000100c6 (timed_thread + 1a) : in r0(0x12b3), res[r5(0x1)] @19148
Dave
-
- Respected Member
- Posts: 296
- Joined: Thu Dec 10, 2009 10:33 pm
segher:
davelacey,
I think I grasp, and accept, the core cycles vs reference clock cycles thing now.
Thank you for the simulator trace. That makes it very clear.
Now what about the issue of IO pin jitter?
If my port output is ultimately timed by the core clock, which it must be as that's what's driving instructions but my code is waiting for "timeafter" which is from the ref clock then there is scope for jitter or not?
OK anyone know how to do that?The easiest way to get timing measured in core clocks for your experiment, is to run the refclk the same, say, both at 100MHz. I think you can set that in your XN file?
That code I posted is a cut down version of my integer FFT implementation which takes about 4ms to run as one thread out of 4. That is where I first noticed this phenomena.Another thing you could do is not measure one instruction but a whole bunch, e.g. 100, and see how many refclk ticks that takes.
davelacey,
I think I grasp, and accept, the core cycles vs reference clock cycles thing now.
Thank you for the simulator trace. That makes it very clear.
Now what about the issue of IO pin jitter?
If my port output is ultimately timed by the core clock, which it must be as that's what's driving instructions but my code is waiting for "timeafter" which is from the ref clock then there is scope for jitter or not?
-
- Experienced Member
- Posts: 104
- Joined: Fri Dec 11, 2009 8:29 pm
Yes. If you have code that goes something like:Heater wrote: Another thing you could
Now what about the issue of IO pin jitter?
If my port output is ultimately timed by the core clock, which it must be as that's what's driving instructions but my code is waiting for "timeafter" which is from the ref clock then there is scope for jitter or not?
Code: Select all
t when timerafter(next_port_time) :> void;
p <: value;
Code: Select all
p @ next_port_time <: value
Dave