Couple questions regarding XLINKS and switches

Technical discussions around xCORE processors (e.g. xcore-200 & xcore.ai).
bearcat
Respected Member
Posts: 283
Joined: Fri Mar 19, 2010 4:49 am

Couple questions regarding XLINKS and switches

Post by bearcat »

1 - Link Buffer size. After some testing, it appears I am able to send at least 8x4 bytes, 10uS delay, over a streaming link without blocking (between 2 tiles). Prior, I had thought only 8 bytes could be sent before blocking. Reading there is a CREDIT64 token, I am thinking there is a 64 byte buffer per link. Is that correct, and is that per link, and not per switch? This is for both sending and receiving? There may be issues on when the credit tokens get sent, which may complicate this. I believe I saw these go over the link, and does appear to be some delay. I believe, the documentation says a maximum of 4 bytes is guaranteed.

2 - Latency. I didn't see anything on latency through the switch (one sentence I am not clear what it says). If I have, lets say a daisy chain of L1's, what is the latency to send a packet to the last tile? I am thinking its possible after the Dreg byte (assuming 1 byte) is decoded the switch could start the forward, instead of waiting for the packet to be fully received. Is the latency through each switch 1 byte time, 2 byte time, or longer?


User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

Hi bearcat,
bearcat wrote:1 - Link Buffer size. After some testing, it appears I am able to send at least 8x4 bytes, 10uS delay, over a streaming link without blocking (between 2 tiles). Prior, I had thought only 8 bytes could be sent before blocking. Reading there is a CREDIT64 token, I am thinking there is a 64 byte buffer per link. Is that correct, and is that per link, and not per switch? This is for both sending and receiving? There may be issues on when the credit tokens get sent, which may complicate this. I believe I saw these go over the link, and does appear to be some delay. I believe, the documentation says a maximum of 4 bytes is guaranteed.
The send and receive datapaths are almost totally separate (they interact
via credits received etc. only). The switch receives data on an "incoming"
link ("half-link", if you want), and sends it out on an outgoing link. It uses
wormhole routing, so it doesn't need to buffer packets (it only needs to
buffer one header, i.e. three bytes maximum, before starting sending the
packet out on the outgoing link). This is good for latency (and it simplifies
implementation a lot). The outgoing datapath doesn't buffer anything (well,
there could be some pipelining, I dunno); the incoming datapath has a
buffer for elasticity. When N credits have been issued on this link by this
switch, that is a promise that it will accept N bytes on the incoming link
without problems. The L chips seem to have a buffer per link (not per switch).
It also seems they only ever issue credits via CREDIT16 tokens, and always
have fewer than 32 outstanding credits. So it seems the buffer is 32 bytes.
2 - Latency. I didn't see anything on latency through the switch (one sentence I am not clear what it says). If I have, lets say a daisy chain of L1's, what is the latency to send a packet to the last tile? I am thinking its possible after the Dreg byte (assuming 1 byte) is decoded the switch could start the forward, instead of waiting for the packet to be fully received. Is the latency through each switch 1 byte time, 2 byte time, or longer?
[Dreg is never sent, only node ID and channel ID are -- but you knew that :-) ].
The switch can start forwarding as soon as it has received a byte of the node ID
that doesn't match the switch's node ID; so after one byte for 8-bit headers, and
one or two bytes otherwise. There could be additional delays, depending on the
implementation (for example, to forward data from the incoming to the outgoing
datapath). Not all of this is measured in byte times, not even bit times: those are
multiples of the "tick" of the system clock, which is usually just 2ns.

It would be interesting to hear what the minimum delay per switch is. My guess
is one byte time plus some extra. Do you think you could measure it?
bearcat
Respected Member
Posts: 283
Joined: Fri Mar 19, 2010 4:49 am

Post by bearcat »

Thanks for the information. My current testbed is just 2 tiles, so I can't measure latency yet. Once I get the new design operating, I will update the post with measured latency.
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

bearcat wrote:My current testbed is just 2 tiles, so I can't measure latency yet.
If it's two L1s, you still can: set up routing on core #2 so that it routes
messages to some non-existent node to node #1, and the other way around.
Send such a message; it will keep going around the two switches. Trace that
with an oscilloscope or logic analyser.

Heck, you can actually do it with an L2: just make sure you go over an external
link, the internal links are hard to probe ;-)