Several questions about XMOS

Non-technical related questions should go here.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

What happens when the pipeline decodes a branch?
Is the instruction-buffer just reset, and up to 32bits of new instruction is fetched !?


Probably not the most confused programmer anymore on the XCORE forum.
User avatar
Woody
XCore Addict
Posts: 165
Joined: Wed Feb 10, 2010 2:32 pm

Post by Woody »

lilltroll wrote:What happens when the pipeline decodes a branch?
Is the instruction-buffer just reset, and up to 32bits of new instruction is fetched !?
For most branches it is possible for the condition (if there is one) to resolved in time to fetch the instruction in the branch's memory access slot and inject that instruction into the pipe in the thread's next slot. This means that most branches are zero latency. This is one of the most elegant aspects of the XCore implementation to my mind.

An cycle must be added for the following scenarios:
1. When the branch instruction must perform a memory access then the thread's memory access slot is not available for the fetch. The obvious example being a retsp
2. When the target of the branch is a misaligned 32bit instruction, two fetches are required to acquire the full instruction.[/list]
Heater
Respected Member
Posts: 296
Joined: Thu Dec 10, 2009 10:33 pm

Post by Heater »

At last I found the missing key part of my determinism and xcore puzzle. See below:

@davelacey,

Good idea, let's not use "deterministic". I think I'd like to use the word "guarantee" for what I have been searching for. Which is something like this:

1) I create a functional block in XC, say an ethernet driver.
2) I publish that functional block for others to use.
3) I'd like be able to make the guarantee to the end user that the ethernet driver will work for them.
4) I have no idea what their app. will look like which makes it hard for me to offer that guarantee.
4) What if it only works when it has full speed of the core available? What if it fails in a system with 6 threads? Etc.

Of course I could document these conditions some how but that's a bit woolly.

The answer to the question of "how do I make the guarantee" comes from the timing analysis tool which I have tried out for the first time today. I must say it's a superb piece of work. It's only the second time I have come across a compiler that issues I timing report after every compilation.

So I take your pin toggling code as an example. Wrap it up as a thread, add a time wasting thread and add an xta endpoint pragma to the toggle loop. Like so:

Code: Select all

#include <platform.h>
#define PERIOD  10000
port p = PORT_UART_TX;
int x = 0;

void do_some_stuff()
{
	x = ~x;
}

void toggle_pin()
{
        int t;
        p <: 0 @ t;
        t += PERIOD;
        while (1)
        {
                #pragma xta endpoint "toggle_loop"
                p @ t <: x;
                do_some_stuff();
                t += PERIOD;
        }
}

void waste_thread(){}

int main()
{
        par
        {
                on stdcore[0]: toggle_pin();
                on stdcore[0]: waste_thread();
                on stdcore[0]: waste_thread();
                on stdcore[0]: waste_thread();
                on stdcore[0]: waste_thread();
                //on stdcore[0]: waste_thread();
        }
        return (0);
}
Then I add a timing script. I want to guarantee a 5MHz max toggle rate so I specify the max loop time of 200ns like so:

Code: Select all

analyze loop toggle_loop
set required - 200 ns
Bingo. If my toggle loop ever exceeds 200ns the compilation reports a failure.

As shown that code passes the timing constrains check at 400Mhz. BUT uncomment that last waste_thread() and it fails.

So now I have the answer to my "guarantee" problem. I have to apply the appropriate xta pragmas to my code. Crucially I have to deliver a timing analysis script with my functional block.

Given that is in place the end user can then be sure my function will work if the compiler says so.

Conclusion:

1) There is no "execution" determinism in the xcore when I write my functional block in isolation of the app it will ultimately sit in. Unless I demand <5 cores and no divides etc. There cannot be such determinism, I don't know what processing power I have available for my function in the end user app.

2) It does not matter. With the xta and a timing script I can still make my "guarantee" of performance to the end user.

3) I hope all components are being supplied with xta scripts to make their integration into end user apps painless.

Once again, congrats to XMOS on creating this amazing tool.
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am
Contact:

Post by segher »

Woody wrote:This means that most branches are zero latency. This is one of the most elegant aspects of the XCore implementation to my mind.
It's nice yes! On par with how the 6502 does taken branches (it runs two half cycles. No, really).
An cycle must be added for the following scenarios:
1. When the branch instruction must perform a memory access then the thread's memory access slot is not available for the fetch. The obvious example being a retsp
This doesn't appply to RETSP 0, I hope?
User avatar
Woody
XCore Addict
Posts: 165
Joined: Wed Feb 10, 2010 2:32 pm

Post by Woody »

segher wrote:This doesn't appply to RETSP 0, I hope?
I think so. If you run up a sim and have a look at the waves you could check ;)
Post Reply