xConnect Switch limitations

Technical discussions around xCORE processors (e.g. xcore-200 & xcore.ai).
MarcAntigny
Active Member
Posts: 61
Joined: Tue Feb 13, 2018 2:50 pm

xConnect Switch limitations

Post by MarcAntigny »

Hi all,
I'm working on xCore 200 (XU232) and I got some issues regarding to the inter-tile communication. I understood that there is only 4 links between the two dies of the xCore 200, so I can use only 4 channels between the two dies. However, what's about inter-tile communication in the same die (between tile 0 and tile 1 for example) ? Is there a limitation like the inter-node one ?
The only information I found was the drawing of XU232 where it seems to have only 4 links to the switch inter-tile in the same die (in red) :
Image

If someone has some clue about those limitations, it would be truly appreciated.
Thanks,
Marc
User avatar
infiniteimprobability
Verified
XCore Legend
Posts: 1133
Joined: Thu May 27, 2010 10:08 am

Post by infiniteimprobability »

Hi,
the limitations are exactly what you'd expect. You can have four open inter-tile channel connections per tile. The switches only have 4 links between them. So in the above case, you could support 2 open conections between tiles 2 and 3, two open connections between tiles 2 and 1 and two open connections between tiles 3 and 1 (or 0). At that point any further channel connections between tiles 2 or 3 and anywhere else would block (code would wait) until one of the former ones closes.

By default interfaces and XC channels open and close during a transfer making the situation better in reality - it's only streaming channels (or own channels using primatives) that can hold a connection open indefinitely.
Engineer at XMOS
MarcAntigny
Active Member
Posts: 61
Joined: Tue Feb 13, 2018 2:50 pm

Post by MarcAntigny »

Hi infiniteimprobability,
Thank you for your clear answer.
I only use XC channels that are not streaming channels so it should work nice. However, is there an overhead time for the opening and closing of channel ? I need to send a lot of samples between 2 tiles with few connections available. So if I open/close them a lot, could it add a significant delay ?
Thanks,
Marc
User avatar
infiniteimprobability
Verified
XCore Legend
Posts: 1133
Joined: Thu May 27, 2010 10:08 am

Post by infiniteimprobability »

There are a number of synchronisation control tokens to setup and clear the channel each time you do a :> or <: (see the assembler output..). The actual data transfer is normally just a single token. So, especially for cross-switch communication were latency gets multiplied from round trips, streaming channels are a LOT faster. They also allow for some decoupling because the switch path will provide 48 bytes (12 long words) worth of buffering. However, they do occupy a circuit continuously. There is a nice compromise though..

https://www.xmos.com/published/how-use- ... r-channels

These let you open/synch, send data like a streaming channel, close/synch.
Engineer at XMOS
MarcAntigny
Active Member
Posts: 61
Joined: Tue Feb 13, 2018 2:50 pm

Post by MarcAntigny »

I'll try with streaming channel and transactions. I hope it will work, I'll tell you.
Thanks infiniteimprobability.
Marc
MarcAntigny
Active Member
Posts: 61
Joined: Tue Feb 13, 2018 2:50 pm

Post by MarcAntigny »

Maybe I should have mentioned the fact that my communication with channel only use inuint and outuint functions (which are only routines for asm instructions, right ?).
I found that :> and <: are about 3 times slower than inuint and outuint functions (maybe because of sync with control token ?)
So transactions (which need :> and <:) didn't work well to reduce the latency.
There is a similar issue with streaming channels which, I guess, only work with :> and <:. Do you think streaming channels could be quickier than use of inuint/outuint ?
Thanks,
Marc
User avatar
infiniteimprobability
Verified
XCore Legend
Posts: 1133
Joined: Thu May 27, 2010 10:08 am

Post by infiniteimprobability »

Do you think streaming channels could be quickier than use of inuint/outuint ?
I'm afraid not. If you look at the ASM, you will see streaming <: boils down to an outuint. The only way you can speed things up is by using the buffering more effectively through scheduling.

For example:

Code: Select all

outuint
outuint
outuint
inuint
inuint
inuint
is faster than:

Code: Select all

outuint
inuint
outuint
inuint
outuint
inuint
Because you have 48 bytes of buffering in each direction across each switch so you can have up to 12x 32b tokens in flight without blocking the thread doing outs..

This doesn't work so well on the same tile because you only have 8 bytes (2 x 32b tokens) but can still help a bit.
Engineer at XMOS
MarcAntigny
Active Member
Posts: 61
Joined: Tue Feb 13, 2018 2:50 pm

Post by MarcAntigny »

Hi,
Thanks for your advice, but I already use the 48 Bytes buffering between the two tiles. I'll try with streaming channels and other scheduling (with a standard channel for sync and a streaming channel for data). But I'm afraid it won't help.
Marc