XUF224 channel comms

Technical questions regarding the xTIMEcomposer, xSOFTip Explorer and Programming with XMOS.
MyKeys
Member++
Posts: 24
Joined: Mon Jul 03, 2017 9:41 am

XUF224 channel comms

Postby MyKeys » Tue Feb 13, 2018 2:51 pm

Hi,

I'm having some interesting timing results using a streaming channel between 2 cores on different tiles through 2 xSwitches (XUF224 tiles 0 & 3).

I know going through multiple xSwitches increases the channel buffering and latency.
I can't understand why the sending thread would have massive pauses if the receiving thread is draining the channel as fast as possible see results below.
Also why does the first iteration round the sending loop some times take much longer?
I did add synchronization between the threads to make sure they started at the same time but this made no difference on the results.

I use the following code to generate the results with -o3 optimization level:

Code: Select all

#include <platform.h>
#include <print.h>

#define CONSECUTIVE_INTS    24

int main()
{
    streaming chan c;

    par
    {
        on tile[0]:
        par
        {
            // Send task
            {
                timer t;
                unsigned start_time, end_time;

                while(1)
                {
                    t :> start_time;

                    #pragma loop unroll
                    for (int i = 0; i < CONSECUTIVE_INTS; ++i)
                    {
                        c <: i;
                    }

                    t :> end_time;
                    printuintln(end_time - start_time);
                }
            }

        }

        on tile[3]:
        par
        {
            // receive task
            {
                unsigned temp;
                while(1)
                {
                    #pragma loop unroll
                    for (int i = 0; i < CONSECUTIVE_INTS; ++i)
                    {
                        c :> temp;
                    }
                }
            }
        }
    }
    return 0;
}


Results below show the printuintln from above which should indicate the loop duration in instructions.
In some cases 2 values are given in the subsequent iterations column due to some inconsistency?





















































CONSECUTIVE_INTS First iteration Subsequent iterations
6 6 6
7 7 7
8 8 8
9 17 9
10 26 13
11 36 23 or 24
12 46 33
13 55 42 or 43
14 65 52 or 53
15 74 61
16 84 71 or 72
17 94 81
18 103 90 or 91
19 113 100 or 101
20 122 109 or 110
21 132 118 or 120
22 142 129


Thanks for any help,
Mike.
User avatar
mon2
XCore Legend
Posts: 1166
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Postby mon2 » Tue Feb 13, 2018 6:10 pm

How are these links mated together?

PCB (copper traces)? wiring?

raw point-to-point or through lvds transceivers? Length of interconnects? On same PCB or though connectors / headers?

Perhaps review the signal integrity of the links?
MyKeys
Member++
Posts: 24
Joined: Mon Jul 03, 2017 9:41 am

Postby MyKeys » Tue Feb 13, 2018 6:45 pm

Hi mon2,

I'm using the standard XUF224 xn file which I don't think configures any external xlinks?
I do have xlink7 wired up to the jtag but again I don't see this being specifically mentioned in the xn file.

Code: Select all

<Links>
        <Link Encoding="5wire" Delays="3clk">
          <LinkEndpoint NodeId="0" Link="7"/>
          <LinkEndpoint NodeId="2" Link="0"/>
        </Link>
        <Link Encoding="5wire" Delays="3clk">
          <LinkEndpoint NodeId="0" Link="4"/>
          <LinkEndpoint NodeId="2" Link="3"/>
        </Link>
        <Link Encoding="5wire" Delays="3clk">
          <LinkEndpoint NodeId="0" Link="6"/>
          <LinkEndpoint NodeId="2" Link="1"/>
        </Link>
        <Link Encoding="5wire" Delays="3clk">
          <LinkEndpoint NodeId="0" Link="5"/>
          <LinkEndpoint NodeId="2" Link="2"/>
        </Link>
        <Link Encoding="5wire">
          <LinkEndpoint NodeId="0" Link="8" Delays="52clk,52clk"/>
          <LinkEndpoint NodeId="1" Link="XL0" Delays="1clk,1clk"/>
        </Link>
        <Link Encoding="5wire">
          <LinkEndpoint NodeId="2" Link="8" Delays="52clk,52clk"/>
          <LinkEndpoint NodeId="3" Link="XL0" Delays="1clk,1clk"/>
        </Link>
      </Links>


Thanks,
Mike.
User avatar
mon2
XCore Legend
Posts: 1166
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Postby mon2 » Tue Feb 13, 2018 6:52 pm

Sorry my bad. Confusing xlinks with channels. Not enough coffee.

Have you seen this thread and the comments from Bianco. They may help.

viewtopic.php?t=1787
MyKeys
Member++
Posts: 24
Joined: Mon Jul 03, 2017 9:41 am

Postby MyKeys » Tue Feb 13, 2018 7:00 pm

Here I'm using a streaming channel (permanent route) and only sending data in one direction.
Sorry I don't see anything in that thread that would relate to this, is there something I missed?

Thanks
User avatar
mon2
XCore Legend
Posts: 1166
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Postby mon2 » Tue Feb 13, 2018 7:21 pm

Just the posted example from that thread was of interest. Assuming that the time for first iteration is longer due to the initial handshake. Have not worked directly with this topic but is interesting to know.

How are the results if the compiler optimization is changed?

Image
MyKeys
Member++
Posts: 24
Joined: Mon Jul 03, 2017 9:41 am

Postby MyKeys » Wed Feb 14, 2018 10:58 am

Optimisation levels 3 and 2 behave the same as above, level 1 has the same pattern but takes longer.
No optimisations takes longer still but all iterations take the same time presumably because the loop is slow enough to mask the initial setup.

I had assumed that all channels declared as streaming would be configured up front but I think you're right in that it happens on the first comms.
Whilst I can understand this triggering a slight delay on the first iteration, why would this only happen when sending more data than 8 ints?

I wonder what test setup is used to attain the maximum bandwidth possible between these tiles?
User avatar
johned
XCore Addict
Posts: 151
Joined: Tue Mar 26, 2013 12:10 pm
Contact:

Postby johned » Wed Feb 14, 2018 12:38 pm

Hi Mike,
One option would be to use the outuint and inuint low level functions in place on <: and :>.
They are defined in xs1.h.
You do not need to specify streaming, for the channel declaration when using outuint and inuint.
Best regards,
John
MyKeys
Member++
Posts: 24
Joined: Mon Jul 03, 2017 9:41 am

Postby MyKeys » Wed Feb 14, 2018 1:23 pm

Hi John,

Using outuint and inuint with a non streaming channel performs exactly the same as the above code.

Both produce the same instructions:

inuint or streaming :> operator:
in (2r) r1, res[r0] *

outuint or streaming <: operator:
out (r2r) res[r0], r3 *

Did you expect a difference?

Mike.
User avatar
johned
XCore Addict
Posts: 151
Joined: Tue Mar 26, 2013 12:10 pm
Contact:

Postby johned » Wed Feb 14, 2018 2:11 pm

Hi Mike,
Thanks for checking. my anecdotal thought was that they would be different however I have just looked at the assembly with a colleague and can confirm that there is no difference.
Best,
john

Who is online

Users browsing this forum: No registered users and 21 guests