xConnect Switch limitations

MarcAntigny · Post by **MarcAntigny** » Fri Sep 07, 2018 1:53 pm

Hi all,
I'm working on xCore 200 (XU232) and I got some issues regarding to the inter-tile communication. I understood that there is only 4 links between the two dies of the xCore 200, so I can use only 4 channels between the two dies. However, what's about inter-tile communication in the same die (between tile 0 and tile 1 for example) ? Is there a limitation like the inter-node one ?
The only information I found was the drawing of XU232 where it seems to have only 4 links to the switch inter-tile in the same die (in red) :

If someone has some clue about those limitations, it would be truly appreciated.
Thanks,
Marc

infiniteimprobability · Mon Sep 10, 2018 9:15 am

Hi,
the limitations are exactly what you'd expect. You can have four open inter-tile channel connections per tile. The switches only have 4 links between them. So in the above case, you could support 2 open conections between tiles 2 and 3, two open connections between tiles 2 and 1 and two open connections between tiles 3 and 1 (or 0). At that point any further channel connections between tiles 2 or 3 and anywhere else would block (code would wait) until one of the former ones closes.

By default interfaces and XC channels open and close during a transfer making the situation better in reality - it's only streaming channels (or own channels using primatives) that can hold a connection open indefinitely.

MarcAntigny · Post by **MarcAntigny** » Mon Sep 10, 2018 9:46 am

Hi infiniteimprobability,
Thank you for your clear answer.
I only use XC channels that are not streaming channels so it should work nice. However, is there an overhead time for the opening and closing of channel ? I need to send a lot of samples between 2 tiles with few connections available. So if I open/close them a lot, could it add a significant delay ?
Thanks,
Marc

infiniteimprobability · Mon Sep 10, 2018 12:37 pm

There are a number of synchronisation control tokens to setup and clear the channel each time you do a :> or <: (see the assembler output..). The actual data transfer is normally just a single token. So, especially for cross-switch communication were latency gets multiplied from round trips, streaming channels are a LOT faster. They also allow for some decoupling because the switch path will provide 48 bytes (12 long words) worth of buffering. However, they do occupy a circuit continuously. There is a nice compromise though..

https://www.xmos.com/published/how-use- ... r-channels

These let you open/synch, send data like a streaming channel, close/synch.

MarcAntigny · Post by **MarcAntigny** » Mon Sep 10, 2018 12:56 pm

I'll try with streaming channel and transactions. I hope it will work, I'll tell you.
Thanks infiniteimprobability.
Marc

MarcAntigny · Post by **MarcAntigny** » Mon Sep 10, 2018 5:18 pm

Maybe I should have mentioned the fact that my communication with channel only use inuint and outuint functions (which are only routines for asm instructions, right ?).
I found that :> and <: are about 3 times slower than inuint and outuint functions (maybe because of sync with control token ?)
So transactions (which need :> and <:) didn't work well to reduce the latency.
There is a similar issue with streaming channels which, I guess, only work with :> and <:. Do you think streaming channels could be quickier than use of inuint/outuint ?
Thanks,
Marc

infiniteimprobability · Tue Sep 11, 2018 11:37 am

Do you think streaming channels could be quickier than use of inuint/outuint ?

I'm afraid not. If you look at the ASM, you will see streaming <: boils down to an outuint. The only way you can speed things up is by using the buffering more effectively through scheduling.

For example:

Code: Select all

outuint
outuint
outuint
inuint
inuint
inuint

is faster than:

Code: Select all

outuint
inuint
outuint
inuint
outuint
inuint

Because you have 48 bytes of buffering in each direction across each switch so you can have up to 12x 32b tokens in flight without blocking the thread doing outs..

This doesn't work so well on the same tile because you only have 8 bytes (2 x 32b tokens) but can still help a bit.

MarcAntigny · Post by **MarcAntigny** » Tue Sep 11, 2018 12:32 pm

Hi,
Thanks for your advice, but I already use the 48 Bytes buffering between the two tiles. I'll try with streaming channels and other scheduling (with a standard channel for sync and a streaming channel for data). But I'm afraid it won't help.
Marc

xConnect Switch limitations

xConnect Switch limitations

Re: xConnect Switch limitations

Re: xConnect Switch limitations

Re: xConnect Switch limitations

Re: xConnect Switch limitations

Re: xConnect Switch limitations

Re: xConnect Switch limitations

Re: xConnect Switch limitations