XCore Exchange

Posted: **Thu Apr 14, 2011 4:02 am**

What happens when the pipeline decodes a branch?
Is the instruction-buffer just reset, and up to 32bits of new instruction is fetched !?

Posted: **Thu Apr 14, 2011 10:50 am**

lilltroll wrote:What happens when the pipeline decodes a branch?
Is the instruction-buffer just reset, and up to 32bits of new instruction is fetched !?

For most branches it is possible for the condition (if there is one) to resolved in time to fetch the instruction in the branch's memory access slot and inject that instruction into the pipe in the thread's next slot. This means that most branches are zero latency. This is one of the most elegant aspects of the XCore implementation to my mind.

An cycle must be added for the following scenarios:
1. When the branch instruction must perform a memory access then the thread's memory access slot is not available for the fetch. The obvious example being a retsp
2. When the target of the branch is a misaligned 32bit instruction, two fetches are required to acquire the full instruction.[/list]

Posted: **Thu Apr 14, 2011 11:30 am**

At last I found the missing key part of my determinism and xcore puzzle. See below:

@davelacey,

Good idea, let's not use "deterministic". I think I'd like to use the word "guarantee" for what I have been searching for. Which is something like this:

1) I create a functional block in XC, say an ethernet driver.
2) I publish that functional block for others to use.
3) I'd like be able to make the guarantee to the end user that the ethernet driver will work for them.
4) I have no idea what their app. will look like which makes it hard for me to offer that guarantee.
4) What if it only works when it has full speed of the core available? What if it fails in a system with 6 threads? Etc.

Of course I could document these conditions some how but that's a bit woolly.

The answer to the question of "how do I make the guarantee" comes from the timing analysis tool which I have tried out for the first time today. I must say it's a superb piece of work. It's only the second time I have come across a compiler that issues I timing report after every compilation.

So I take your pin toggling code as an example. Wrap it up as a thread, add a time wasting thread and add an xta endpoint pragma to the toggle loop. Like so:

Code: Select all

#include <platform.h>
#define PERIOD  10000
port p = PORT_UART_TX;
int x = 0;

void do_some_stuff()
{
	x = ~x;
}

void toggle_pin()
{
        int t;
        p <: 0 @ t;
        t += PERIOD;
        while (1)
        {
                #pragma xta endpoint "toggle_loop"
                p @ t <: x;
                do_some_stuff();
                t += PERIOD;
        }
}

void waste_thread(){}

int main()
{
        par
        {
                on stdcore[0]: toggle_pin();
                on stdcore[0]: waste_thread();
                on stdcore[0]: waste_thread();
                on stdcore[0]: waste_thread();
                on stdcore[0]: waste_thread();
                //on stdcore[0]: waste_thread();
        }
        return (0);
}

Then I add a timing script. I want to guarantee a 5MHz max toggle rate so I specify the max loop time of 200ns like so:

Code: Select all

analyze loop toggle_loop
set required - 200 ns

Bingo. If my toggle loop ever exceeds 200ns the compilation reports a failure.

As shown that code passes the timing constrains check at 400Mhz. BUT uncomment that last waste_thread() and it fails.

So now I have the answer to my "guarantee" problem. I have to apply the appropriate xta pragmas to my code. Crucially I have to deliver a timing analysis script with my functional block.

Given that is in place the end user can then be sure my function will work if the compiler says so.

Conclusion:

1) There is no "execution" determinism in the xcore when I write my functional block in isolation of the app it will ultimately sit in. Unless I demand <5 cores and no divides etc. There cannot be such determinism, I don't know what processing power I have available for my function in the end user app.

2) It does not matter. With the xta and a timing script I can still make my "guarantee" of performance to the end user.

3) I hope all components are being supplied with xta scripts to make their integration into end user apps painless.

Once again, congrats to XMOS on creating this amazing tool.

Posted: **Thu Apr 14, 2011 12:36 pm**

Woody wrote:This means that most branches are zero latency. This is one of the most elegant aspects of the XCore implementation to my mind.

It's nice yes! On par with how the 6502 does taken branches (it runs two half cycles. No, really).

An cycle must be added for the following scenarios:
1. When the branch instruction must perform a memory access then the thread's memory access slot is not available for the fetch. The obvious example being a retsp

This doesn't appply to RETSP 0, I hope?

Posted: **Thu Apr 14, 2011 2:02 pm**

segher wrote:This doesn't appply to RETSP 0, I hope?

I think so. If you run up a sim and have a look at the waves you could check ;)

XCore Exchange

Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS

Re: Several questions about XMOS