Streaming channel

New to XMOS and XCore? Get started here.
Post Reply
shyamkrish8iitm
Member
Posts: 15
Joined: Mon May 28, 2012 11:20 am

Streaming channel

Post by shyamkrish8iitm »

Hi, I am not understanding how streaming channel works. I have read XC programming guide but I am not getting clear picture. I would like to know the following,

a) Hardware implication of Channel, streaming Channel

b) As soon as I declare "int" variable, is 32 bits of space is allocated in RAM (is this right?). If it's true, what is happening when I declare chan and streaming chan variables?

c) In the following program, I am trying to send data from Thread X to Thread Y using streaming Channel. Whats wrong in my code? Its not working properly. I mean desired output is not showing up!!


Code: Select all

#include <xs1.h>
#include <platform.h>
#include <print.h>

void transmit(streaming chanend);
void receive(streaming chanend);

void main(){
	streaming chan c;
	par{
		{transmit(c);}  //Thread 1
		{receive(c);}   //Thread 2
	}
}

void transmit(streaming chanend c){
	int data[20]={1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10};
	int i=0;
	for(i=0;i<20;i++){
		c <: data[i];
		printstr("Thread1:");
		printuint(data[i]);

	}
}

void receive(streaming chanend c){
	int temp;
	c :> temp;  //will wait till c gets an value?!
	temp = temp+1;
	printstr("Thread2:");
	printuint(temp);
}


User avatar
rp181
Respected Member
Posts: 395
Joined: Tue May 18, 2010 12:25 am
Contact:

Post by rp181 »

a) I don't know too much about this, but each XCore has a certain amount of channels it can use. Declaring a channel doesn't reserve any of these, and the channel is re-negotiated every time it is used. As a result, you can have as many channels as you need, but this also means data transfer is slower. A streaming channel reserves one of the channels, and establishes a permanent connection, giving you the fastest possible data rates. You are limited to how many you can use (dont remember the number).

A compromise is transaction, which you can use to send a series of bytes with only one negotiation. Read about channels in the XC Programming guide.

b) Again, I don't know too much about the low level workings, but I believe you are correct. 32 bits of the available 64k are reserved.

c) First off, your receive thread only receives one value and terminates. If you want to receive all of the values, put that in a loop as well.

c :> temp will indeed block until it receives a value. From what I see, this program should work. However, I've had problems with printing stuff - it flushes the buffer based on some other conditions. Printing a newline character acts as a flush.
Replace

Code: Select all

printstr("Thread1:");
printuint(data[i]);
With:

Code: Select all

printf("Thread1: %i\n", data[i]);
Also, add the include for stdio:

Code: Select all

#include <stdio.h>
shyamkrish8iitm
Member
Posts: 15
Joined: Mon May 28, 2012 11:20 am

Post by shyamkrish8iitm »

Thank you very much.. That was helpful. I am still waiting for hardware implication if you get to know in future.
User avatar
Bianco
XCore Expert
Posts: 754
Joined: Thu Dec 10, 2009 6:56 pm
Contact:

Post by Bianco »

When you send something over a channel it will open a route over the network (for example a network of multiple processors with their switches linked together).
During transmission, any point in the route cannot be used by other channels.

When using "regular" channels this route will be closed after sending a data object, freeing the network resources to be used by other channels.

When using a streaming channel this route will keep open for future transmissions, making the network resources unavailable to other channels.

Streaming channels have the advantage of having less overhead, mainly because they do not use synchronisation tokens between the sender and receiver when transmitting a data object.

The number of streaming channels that you can have can be limited.
Between two threads on the same core you can have as many as you want (until you run out of chanend resources).

Each core has four links to the switch on the chip. This means that any core can only have 4 streaming channels to destinations outside the core.

There can be more limitations: If you have two processors connected with each other externally using one XMOS link, you can only have one streaming channel between the two processors. While this streaming channel exists there can not be any other channels between the two processors.

The difference between regular and streaming channels does not have hardware implications: It is more a matter of using the channels in a different way. Regular channels can be seen as a packet switched network while streaming channels use circuit switching. In fact the regular channels also use circuit switching but because the link is closed after each transmission it much behaves like packet switching.
shyamkrish8iitm
Member
Posts: 15
Joined: Mon May 28, 2012 11:20 am

Post by shyamkrish8iitm »

Thanks you very much... Here is the completed correct code with your help..

Code: Select all

#include <xs1.h>
#include <platform.h>
#include <print.h>
#include <stdio.h>

void transmit(streaming chanend,int);
void receive(streaming chanend);

void main(){
   streaming chan c;
   int i;
   par{
      {for(i=0;i<20;i++){
     transmit(c,i);}
      }  //Thread 1
      {receive(c);}   //Thread 2
   }
}

void transmit(streaming chanend c,int i){
   int data[20]={1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10};
      c <: data[i];
      printf("Thread1: %i  ", data[i]);
}

void receive(streaming chanend c){
   int temp;
   int i=0;
   for(i=0;i<20;i++){
   c :> temp;  //will wait till c gets an value?!
   temp = temp+1;
   printf("Thread2: %i\n", temp);
   }
}
shyamkrish8iitm
Member
Posts: 15
Joined: Mon May 28, 2012 11:20 am

Post by shyamkrish8iitm »

Transaction Sequence of matching outputs and inputs are communicated over a channel asynchronously, with the entire transaction being synchronized at its beginning and end.

Streams Establishes permanent route between two threads over which data can be efficiently communicated without synchronization.


a) How is transaction different from streams, other than streams have permanent route.

b) In Verilog, in order to link block X and Y we must definitely need to synchronize them using a FIFO register or my introducing wait states. So please explain how streams communicate without synchronization? In my correct code posted above, I feel wait states are being introduced for synchronization. Am I right?

c) Also explain, what synchronization at beginning and end means?

Thank you
User avatar
Bianco
XCore Expert
Posts: 754
Joined: Thu Dec 10, 2009 6:56 pm
Contact:

Post by Bianco »

Consider a system with two XMOS processors linked together with two XMOS Links.
One of the links is set to a very fast transfer speed and the other is set to a very slow transfer speed.

If we send objects from XMOS processor 1 to XMOS processor 2 and directly after sending the object we close the route by sending an END token, we do not know whether the object will be transmitted over the fast or slow link.

If we transmit the first object over the slow link, followed by sending an object over the fast link, it is possible that the second object arrives first.

To guarantee the order of arrival we need synchronisation.
We do this by explicitly letting the receiver telling the transmitter that it is ready to receive.
This avoids multiple objects in transmission.

From the XMOS ABI:
An output statement is compiled as followed:
  1. Output an end control token
    outct res[c], CT END
  2. Check for an ack control token
    checkct res[c], CT ACK
  3. Output the variable to the channel
  4. Check for an end control token
    checkct res[c], CT END
  5. Output an end control token
    outct res[c], CT END
An input statement is compiled as follows:
  1. Check for an end control token
    checkct res[c], CT END
  2. Output an ack control token
    outct res[c], CT ACK
  3. Input from the channel into the destination variable
  4. Output an end control token
    outct res[c], CT END
  5. Check for an end control token
    checkct res[c], CT END
As you can see the synchronization is performed before and after transmitting each data object.
This can provide a large overhead.

With transmissions we can transmit a sequence of data objects with only synchronization at the start and end of the sequence.
User avatar
JohnWilson
Active Member
Posts: 38
Joined: Fri Oct 28, 2011 10:53 pm
Contact:

Post by JohnWilson »

How fast do the various kinds of channel I/O happen within a single core? I've been sending single END tokens over channels (e.g. outct res[r11],XS1_CT_END) for thread synchronization, since it's an easy way to break out of a WAITEx instruction when another thread wants attention (just like writing a junk byte over a pipe to break out of select() on Unix).

What I really want are semaphores, and I'm a bit surprised not to see specific hardware support for those (since the XS1 has the nicest implementation of locks that I've seen, and semaphores are kind of in the same category), but I figure that's probably because they expect us to use channels since they can be exactly equivalent. But I want to make sure I'm doing that the fast way!

Anyway so this streaming vs. transaction talk has me wondering if it matters exactly what I send over my channels when it's all within a single core. I chose END just because it seems neighborly not to hog a route needlessly, and when I disassemble XC's output it appears that END is the only control token it actually uses (the END/ACK/data/END/END sequence from the ABI document apparently isn't used in real life).

So: does local routing take time? Should I skip control tokens entirely (which I guess makes it a "streaming" channel that stays open for life) and send a junk data byte instead? Or in that case will the hardware wait until it's accumulated a word before triggering the event so I have to send junk words instead?

Thanks!
User avatar
Bianco
XCore Expert
Posts: 754
Joined: Thu Dec 10, 2009 6:56 pm
Contact:

Post by Bianco »

On the same core the latency is 1 thread cycle and the bandwidth is as much as the ISA can handle.
You can also have as much streaming channels as you want on the same core (until you run out of chanends of course). You don't need specific control tokens in that case, unless you want to use control tokens for a custom protocol of course. The only good reason for not using streaming channels when two threads reside on the same core is providing more portability, i.e. a thread using regular channels can be easier moved to other cores without having to take into account the network resources.

You are right about the ACK.
User avatar
bear118
Active Member
Posts: 37
Joined: Wed Jan 09, 2019 10:57 am

Post by bear118 »

How to Initialize a" streaming chanend "variable? I need to retransmit the data, so I want to clean up the old data in chan.
thx.
Post Reply