XUF224 channel comms

Technical questions regarding the XTC tools and programming with XMOS.
MyKeys
Active Member
Posts: 33
Joined: Mon Jul 03, 2017 9:41 am

XUF224 channel comms

Post by MyKeys »

Hi,

I'm having some interesting timing results using a streaming channel between 2 cores on different tiles through 2 xSwitches (XUF224 tiles 0 & 3).

I know going through multiple xSwitches increases the channel buffering and latency.
I can't understand why the sending thread would have massive pauses if the receiving thread is draining the channel as fast as possible see results below.
Also why does the first iteration round the sending loop some times take much longer?
I did add synchronization between the threads to make sure they started at the same time but this made no difference on the results.

I use the following code to generate the results with -o3 optimization level:

Code: Select all

#include <platform.h>
#include <print.h>

#define CONSECUTIVE_INTS    24

int main()
{
    streaming chan c;

    par
    {
        on tile[0]:
        par
        {
            // Send task
            {
                timer t;
                unsigned start_time, end_time;

                while(1)
                {
                    t :> start_time;

                    #pragma loop unroll
                    for (int i = 0; i < CONSECUTIVE_INTS; ++i)
                    {
                        c <: i;
                    }

                    t :> end_time;
                    printuintln(end_time - start_time);
                }
            }

        }

        on tile[3]:
        par
        {
            // receive task
            {
                unsigned temp;
                while(1)
                {
                    #pragma loop unroll
                    for (int i = 0; i < CONSECUTIVE_INTS; ++i)
                    {
                        c :> temp;
                    }
                }
            }
        }
    }
    return 0;
}
Results below show the printuintln from above which should indicate the loop duration in instructions.
In some cases 2 values are given in the subsequent iterations column due to some inconsistency?
CONSECUTIVE_INTS First iteration Subsequent iterations
6 6 6
7 7 7
8 8 8
9 17 9
10 26 13
11 36 23 or 24
12 46 33
13 55 42 or 43
14 65 52 or 53
15 74 61
16 84 71 or 72
17 94 81
18 103 90 or 91
19 113 100 or 101
20 122 109 or 110
21 132 118 or 120
22 142 129
Thanks for any help,
Mike.


User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

How are these links mated together?

PCB (copper traces)? wiring?

raw point-to-point or through lvds transceivers? Length of interconnects? On same PCB or though connectors / headers?

Perhaps review the signal integrity of the links?
MyKeys
Active Member
Posts: 33
Joined: Mon Jul 03, 2017 9:41 am

Post by MyKeys »

Hi mon2,

I'm using the standard XUF224 xn file which I don't think configures any external xlinks?
I do have xlink7 wired up to the jtag but again I don't see this being specifically mentioned in the xn file.

Code: Select all

<Links>
        <Link Encoding="5wire" Delays="3clk">
          <LinkEndpoint NodeId="0" Link="7"/>
          <LinkEndpoint NodeId="2" Link="0"/>
        </Link>
        <Link Encoding="5wire" Delays="3clk">
          <LinkEndpoint NodeId="0" Link="4"/>
          <LinkEndpoint NodeId="2" Link="3"/>
        </Link>
        <Link Encoding="5wire" Delays="3clk">
          <LinkEndpoint NodeId="0" Link="6"/>
          <LinkEndpoint NodeId="2" Link="1"/>
        </Link>
        <Link Encoding="5wire" Delays="3clk">
          <LinkEndpoint NodeId="0" Link="5"/>
          <LinkEndpoint NodeId="2" Link="2"/>
        </Link>
        <Link Encoding="5wire">
          <LinkEndpoint NodeId="0" Link="8" Delays="52clk,52clk"/>
          <LinkEndpoint NodeId="1" Link="XL0" Delays="1clk,1clk"/>
        </Link>
        <Link Encoding="5wire">
          <LinkEndpoint NodeId="2" Link="8" Delays="52clk,52clk"/>
          <LinkEndpoint NodeId="3" Link="XL0" Delays="1clk,1clk"/>
        </Link>
      </Links>
Thanks,
Mike.
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Sorry my bad. Confusing xlinks with channels. Not enough coffee.

Have you seen this thread and the comments from Bianco. They may help.

http://www.xcore.com/viewtopic.php?t=1787
MyKeys
Active Member
Posts: 33
Joined: Mon Jul 03, 2017 9:41 am

Post by MyKeys »

Here I'm using a streaming channel (permanent route) and only sending data in one direction.
Sorry I don't see anything in that thread that would relate to this, is there something I missed?

Thanks
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Just the posted example from that thread was of interest. Assuming that the time for first iteration is longer due to the initial handshake. Have not worked directly with this topic but is interesting to know.

How are the results if the compiler optimization is changed?

Image
MyKeys
Active Member
Posts: 33
Joined: Mon Jul 03, 2017 9:41 am

Post by MyKeys »

Optimisation levels 3 and 2 behave the same as above, level 1 has the same pattern but takes longer.
No optimisations takes longer still but all iterations take the same time presumably because the loop is slow enough to mask the initial setup.

I had assumed that all channels declared as streaming would be configured up front but I think you're right in that it happens on the first comms.
Whilst I can understand this triggering a slight delay on the first iteration, why would this only happen when sending more data than 8 ints?

I wonder what test setup is used to attain the maximum bandwidth possible between these tiles?
User avatar
johned
XCore Addict
Posts: 185
Joined: Tue Mar 26, 2013 12:10 pm
Contact:

Post by johned »

Hi Mike,
One option would be to use the outuint and inuint low level functions in place on <: and :>.
They are defined in xs1.h.
You do not need to specify streaming, for the channel declaration when using outuint and inuint.
Best regards,
John
MyKeys
Active Member
Posts: 33
Joined: Mon Jul 03, 2017 9:41 am

Post by MyKeys »

Hi John,

Using outuint and inuint with a non streaming channel performs exactly the same as the above code.

Both produce the same instructions:

inuint or streaming :> operator:
in (2r) r1, res[r0] *

outuint or streaming <: operator:
out (r2r) res[r0], r3 *

Did you expect a difference?

Mike.
User avatar
johned
XCore Addict
Posts: 185
Joined: Tue Mar 26, 2013 12:10 pm
Contact:

Post by johned »

Hi Mike,
Thanks for checking. my anecdotal thought was that they would be different however I have just looked at the assembly with a colleague and can confirm that there is no difference.
Best,
john
Post Reply