Several questions about XMOS

infiniteimprobability · Thu Apr 07, 2011 9:21 am

Here are a couple of pictures which hopefully explain it. Basically each active thread gets one clock before the next thread is scheduled, which is true down to a minimum of 4 threads running. Than means you are never more than n clocks (where n = number of threads which is >=4 ) away from being executed.

The practical upshot of that is that you have a guaranteed worst case MIPS in each thread - that along with predictable instruction timing (and no interrupts needed due to the events) means that you can build software that is 100% deterministic - deterministic enough to deliver hardware interfaces in software.

Heater · Post by **Heater** » Thu Apr 07, 2011 9:57 am

infiniteimprobability,

That is a very nice explanation and set of diagrams.

Problem is this statement:

that along with predictable instruction timing (and no interrupts needed due to the events) means that you can build software that is 100% deterministic

is NOT exactly true:)

Recently I have been running some timing tests like so:
1) A blob of code to be timed is wrapped in a timing loop that calculates how many timer ticks it takes and prints the result. This loop is run as a thread.

2) Another thread function is defined that is basically a "do nothing" loop.

3) The timing of the first thread is made when running from 2 to 7 instances of the "do nothing" loop.

Results:

a) When running a total thread count from 1 to 4 the timing loop repeatedly reports the same time for the blobs execution.

b) When running 4 of the "do nothing" threads (5 threads total) the blobs execution time increases by 25%, as expected.

c) When running 5 of the "do nothing" threads (6 thread in total) the blobs execution time increases by the same amount.

d) And so on up to 8 threads total.

HOWEVER, somewhere around 6 or 7 total threads (I forget exactly) the blobs execution time alternates between two values that differ by 1.
That is to say there is a 10ns jitter in it's execution time.

Not much you might say. True but it's not 100% deterministic.

I was about to post a question about this observation when I have boiled my test down to a few lines of code to post here.

Next issue is divide and modulus. I understood from comments made by David May a while back that the execution of these instructions could cause other threads to jitter. I have yet to demonstrate that to myself experimentally.

omega7 · Post by **omega7** » Thu Apr 07, 2011 1:59 pm

I presume that the four different colours represent the four pipeline stages?

But, I cannot match this info with the info in "Programming XC on XMOS Devices, chapter thread performance, page 37". It says :

"Because individual threads may be delayed on I/O, their unused processor cycles can be taken by other threads. Thus, for more than four threads, the performance of each thread is often higher than the minimum shown above."

When XMOS is deterministic, (thread) performance should not be dependent on delaying I/O? Or am I missing something? I expect that I confuse two things here......

Martin

Post by **jonathan** » Thu Apr 07, 2011 2:14 pm

OK, so basically...

When a thread is "waiting" for an event to happen, it goes to sleep. Instructions (by default) are not issued from this thread, and therefore an instruction scheduling slot (of which there are between 4 and 8) is not allocated to it.

What this means is that if you have 5 threads, but one of them is "waiting" for an event to occur (such as a specific input on a specific port), you will in fact have the four threads that are not "waiting" - ie threads that are "ready" - actually running consecutively, whilst the fifth thread is descheduled.

This means that you can have more than 4 threads running simultaneously but will still get 4-threaded performance as long as no more than 4 threads are ever "ready".

It should be stated that this is the default behaviour - and as in a related thread - "fast mode" can be used which I believe guarantees a thread scheduling slot even if a thread is "waiting" - this is effectively polling behaviour that (probably) lowers the overall instruction throughput of the program in return for saving on average a few cycles event response time (because you do not have to reschedule the thread into an available slot once the event arrives).

Hope this helps.

Post by **jonathan** » Thu Apr 07, 2011 2:20 pm

I think the wording on slide 17 above is misleading and wrong.

1. "Each thread executes a minimum every 4 clock ticks - f/4 MHz"
2. "Each thread executes a minimum every 8 clock ticks - f/8 MHz"

The first should really say: "Each ready thread can execute at most once every four clock ticks, maximum of f/4 MHz".

The second should really say: "Each ready thread executes at least every eight clock ticks, minimum f/8 MHz and maximum f/4 MHz".

Even those statements don't quite express the event-driven nature of the XMOS scheduling, though. However, I do think they should be corrected, as they are clearly confusing. The former implies threads can run faster than f/4 MHz (which they can't).

Post by **jonathan** » Thu Apr 07, 2011 2:29 pm

Heater, would love to see your code that exhibits this "non-deterministic" behaviour.

Post by **jonathan** » Thu Apr 07, 2011 2:54 pm

jonathan wrote:Heater, would love to see your code that exhibits this "non-deterministic" behaviour.

At present I can think of one explanation only... and it would occur only at exactly 7 threads.

Heater · Post by **Heater** » Thu Apr 07, 2011 7:06 pm

jonathan,

OK here is the minimal program I came up with that demonstrates a 1 clock jitter in the execution of a timing loop with various threads running. As shown 1 thread is running the timer loop and 4 are just idle looping. The results are like so:

Code: Select all

Determinism test:
Run time = 1 timer ticks
Run time = 1 timer ticks
Run time = 2 timer ticks
Run time = 1 timer ticks
Run time = 1 timer ticks

Results for various numbers of threads are like so:

Code: Select all

Threads    Clocks
8          2 
7          2/1
6          1
5          2/1
4          1
3          1
2          1
1          1

As you see a total of 5 or 7 threads results in a one clock jitter.

Here is the program:

Code: Select all

#include <stdio.h>
#include <platform.h>

void waste_thread()
{
	while(1)
    {
    }
}

void timed_thread()
{
    long startTime, endTime;
    timer t;

    printf ("Determinism test:\n");
    while (1)
    {
    	// Start benchmark timer
    	t :> startTime;

    	// Stop benchmark timer
    	t :> endTime;

    	printf ("Run time = %d timer ticks\n", endTime - startTime);
    }
}

int main()
{
	par
	{
        on stdcore[0]: timed_thread();
        on stdcore[0]: waste_thread();
        on stdcore[0]: waste_thread();
        on stdcore[0]: waste_thread();
        on stdcore[0]: waste_thread();
        //on stdcore[0]: waste_thread();
        //on stdcore[0]: waste_thread();
        //on stdcore[0]: waste_thread();
	}
 	return 0;
}

Heater · Post by **Heater** » Thu Apr 07, 2011 7:13 pm

Strangely enough this jitter problem does not show up when building that determinism test in debug mode. The execution times are just much longer.

davidnorman · Post by **davidnorman** » Thu Apr 07, 2011 7:35 pm

I have seen this effect too. It took a while to track down what was happening. At first we thought it was clock retiming across the L2 cores, then we thought it was unknown delays in the channels, but in the end it turned out to be thread scheduling.

Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS