Efficient transfer of bytes array over tiles

If you have a simple question and just want an answer.
Post Reply
User avatar
Automatyk
Member
Posts: 13
Joined: Thu Jul 30, 2015 12:10 pm

Efficient transfer of bytes array over tiles

Post by Automatyk »

Hello,

I'd like to transfer an array of bytes across two tiles. I know that data has to be passed by value because of separated RAM on tiles. I wanted to utilize transactions, so i wrote down a simple programm containing two task. First one was sending char array of size 100 over channel and the second recived this data in select statement. First task was also measuring the time taken on the transaction, and it was working fine.
I did also second test. The idea of second test was pretty much the same like in previous example, but insted of transfering byte array I sent a structure. This structure had only one field which was a char array of size 100.
Both of those tests are working correctly but the time taken on sending structure was almost 10 times smaller. Transfering byte array in first test took abaout 0.0095 ms, in comparison time taken on sending structure containing a byte array of the same size took about 0.00090ms.

In my application I have to use variable length array so I cannot sent it as a structure. Why is there so big difference between those two cases. Is there any more efficient way of sending array across the tiles than using transactions?

below is source code of example with the structure

Code: Select all

#include <platform.h>
#include <print.h>
#include <stdint.h>
#include <stdio.h>

#define T_1ms    100000 //<- 1 milisecond
#define PARAM_NB 100    //<- byte array size

typedef struct data_structure
{
    char bytes[PARAM_NB];
}data_structure;

// Transactions are used for byte array transfer over channel

// sending transaction
transaction send(chanend c, data_structure &data)
{
    c <: data;
}

// data are recived in select statement
transaction event(chanend c, data_structure &data)
{
    c :> data;
}



void sender_task(chanend c )
{
    /** timers measures time taken on transaction **/
    timer stopwach;
    uint32_t start, stop;

    data_structure structure;
    /* initialize byte array to send **/
    for(int i =0; i < PARAM_NB; i++) structure.bytes[i] = i;

    // Send data over channel and measure the time
    stopwach :> start;
    master send(c, structure);
    stopwach :> stop;

    // Print measured time in miliseconds
    printstr("Time taken on transaction: ");
    printf("%.5f  ms \n", ((float)(stop - start))/((float)(T_1ms)));


    while(1);
}

void reciver_task(chanend c)
{
    /* create byte array the data will be stored to**/
      data_structure structure;
      while(1)
      {
          select
          {
              case slave {event(c, structure);}:
                      break;
          }
      }
}

int main(void)
{
    chan c;

    par{
        reciver_task( c );
        sender_task( c );
    }

    return 0;
}

... and the source code of example with the byte array

Code: Select all

#include <platform.h>
#include <print.h>
#include <stdint.h>
#include <stdio.h>

#define T_1ms    100000 //<- 1 milisecond
#define PARAM_NB 100    //<- byte array size

// Transactions are used for byte array transfer over channel

// sending transaction
transaction send(chanend c, char *bytes)
{
    for(int i=0;i < PARAM_NB; i++)
        c <: bytes[i];
}

// data are recived in select statement
transaction event(chanend c, char *bytes)
{
    for(int i=0; i<PARAM_NB; i++)
        c :> bytes[i];
}



void sender_task(chanend c )
{
    /** timers measures time taken on transaction **/
    timer stopwach;
    uint32_t start, stop;

    /* initialize byte array to send **/
    char bytes[PARAM_NB];
    for(int i =0; i < PARAM_NB; i++) bytes[i] = i;

    // Send data over channel and measure the time
    stopwach :> start;
    master send(c, bytes);
    stopwach :> stop;

    // Print measured time in miliseconds
    printstr("Time taken on transaction: ");
    printf("%.5f  ms \n", ((float)(stop - start))/((float)(T_1ms)));


    while(1);
}

void reciver_task(chanend c)
{
    /* create byte array the data will be stored to**/
      char bytes[PARAM_NB];
      while(1)
      {
          select
          {
              case slave {event(c, bytes);}:
                      break;
          }
      }
}

void main(void)
{
    chan c;

    par{
        reciver_task( c );
        sender_task( c );
    }
}

both of those examples was tested on startKIT device.

Thank you in advance for help, and I am sorry for my bad english

Regards,
Automatyk


User avatar
larry
Respected Member
Posts: 275
Joined: Fri Mar 12, 2010 6:03 pm

Post by larry »

When you send a whole structure, compiler will transfer it in words. In your case only 25 word transfers. You can see it clearly in simulation by looking for output instructions on a channel end:

Code: Select all

$ xsim -t a.xe | grep 'out .*res.*0x......02)' | grep -v switch
tile[0]@0- -DI A-w-.----fff01ba4 (_fdp.bss.large      +ffebc680) : out     res[r0(0x80020202)], r4(0x63626160) @6728
tile[0]@0- -DI A-p-.----fff01ba8 (_fdp.bss.large      +ffebc684) : out     res[r0(0x80020202)], r4(0x5f5e5d5c) @6733
tile[0]@0- -DI A-p-.----fff01bac (_fdp.bss.large      +ffebc688) : out     res[r0(0x80020202)], r4(0x5b5a5958) @6738
tile[0]@0-P-DI A-p-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x57565554) @6743
tile[0]@0- -DI A-a-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x57565554) @6759
tile[0]@0- -DI A-w-.----fff01ba4 (_fdp.bss.large      +ffebc680) : out     res[r0(0x80020202)], r4(0x53525150) @6774
tile[0]@0- -DI A-p-.----fff01ba8 (_fdp.bss.large      +ffebc684) : out     res[r0(0x80020202)], r4(0x4f4e4d4c) @6779
tile[0]@0- -DI A-p-.----fff01bac (_fdp.bss.large      +ffebc688) : out     res[r0(0x80020202)], r4(0x4b4a4948) @6784
tile[0]@0-P-DI A-p-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x47464544) @6789
tile[0]@0- -DI A-a-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x47464544) @6800
tile[0]@0- -DI A-w-.----fff01ba4 (_fdp.bss.large      +ffebc680) : out     res[r0(0x80020202)], r4(0x43424140) @6815
tile[0]@0- -DI A-p-.----fff01ba8 (_fdp.bss.large      +ffebc684) : out     res[r0(0x80020202)], r4(0x3f3e3d3c) @6820
tile[0]@0- -DI A-p-.----fff01bac (_fdp.bss.large      +ffebc688) : out     res[r0(0x80020202)], r4(0x3b3a3938) @6825
tile[0]@0-P-DI A-p-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x37363534) @6830
tile[0]@0- -DI A-a-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x37363534) @6841
tile[0]@0- -DI A-w-.----fff01ba4 (_fdp.bss.large      +ffebc680) : out     res[r0(0x80020202)], r4(0x33323130) @6856
tile[0]@0- -DI A-p-.----fff01ba8 (_fdp.bss.large      +ffebc684) : out     res[r0(0x80020202)], r4(0x2f2e2d2c) @6861
tile[0]@0- -DI A-p-.----fff01bac (_fdp.bss.large      +ffebc688) : out     res[r0(0x80020202)], r4(0x2b2a2928) @6866
tile[0]@0-P-DI A-p-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x27262524) @6871
tile[0]@0- -DI A-a-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x27262524) @6882
tile[0]@0- -DI A-w-.----fff01ba4 (_fdp.bss.large      +ffebc680) : out     res[r0(0x80020202)], r4(0x23222120) @6897
tile[0]@0- -DI A-p-.----fff01ba8 (_fdp.bss.large      +ffebc684) : out     res[r0(0x80020202)], r4(0x1f1e1d1c) @6902
tile[0]@0- -DI A-p-.----fff01bac (_fdp.bss.large      +ffebc688) : out     res[r0(0x80020202)], r4(0x1b1a1918) @6907
tile[0]@0-P-DI A-p-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x17161514) @6912
tile[0]@0- -DI A-a-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x17161514) @6923
tile[0]@0- -DI A-w-.----fff01ba4 (_fdp.bss.large      +ffebc680) : out     res[r0(0x80020202)], r4(0x13121110) @6938
If you transfer byte by byte explicitly, compiler will leave use OUTT (output byte) instructions instead. In your case 100 byte transfers:

Code: Select all

$ xsim -t a.xe | grep 'outt .*res.*0x......02)' | grep -v switch
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x0) @6702
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x1) @6747
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x2) @6792
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x3) @6837
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x4) @6882
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x5) @6927
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x6) @6972
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x7) @7017
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x8) @7062
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x9) @7107
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0xa) @7152
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0xb) @7197
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0xc) @7242
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0xd) @7287
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0xe) @7332
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0xf) @7377
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x10) @7422
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x11) @7467
. . .
I would suggest looking at XC interfaces. When you transfer an array or structure across tiles using an interface, compiler will only copy bytes that will actually be used by the destination. And it will send bytes as words (OUT as above) where possible.
User avatar
Automatyk
Member
Posts: 13
Joined: Thu Jul 30, 2015 12:10 pm

Post by Automatyk »

Thank you for your answer. Now it's more clear to me.
I also thought of interfaces but due to fact that I have only startKIT board avialable I couldn't test how it deal with transfering array across the tails. I know that when array is sent over interfaces between cores on the same tile, it passes the reference to this array in fact.
Soon I will get my new PCB with xCORE-200 device, so I will check if interfaces are a good choice.

Thank you once again for help and sorry for my bad english:)

Kind regards,
Automatyk
User avatar
Automatyk
Member
Posts: 13
Joined: Thu Jul 30, 2015 12:10 pm

Post by Automatyk »

I have another question. Im currently working with interfaces as you advised and I'm facing another issue.
Sometimes in my application the server end would have to send byte array to the client end.
I though of using notifications. Server would notify to the client that data are ready. Then the client would call interface function which returns the data as described in Programming Guide (2.2.4 Passing data via interface function calls).

The issue is that i cannot make it workig.
Q1: Is it possible to solve this issue in the way that I described
Q2: If so, would it be still working for sending byte array across the tiles.
Q3: I also though of connecting those two task usign two interfaces. If i connect them in that way, will the total number of chaned used be doubled?

It has to be working across tiles and for variable length arrays.

Has anyone solved this problem yet?

Thank you in advance for help and sorry for my bad english.

regards,
Automatyk
Gothmag
XCore Addict
Posts: 129
Joined: Wed May 11, 2016 3:50 pm

Post by Gothmag »

I think the way you'd like to do it sounds good, but I personally have had issues with notifications, and even some issues with interface calls taking the equivalent of forever between tiles. The way I ended up solving both problems is by having another thread running as a data server. In terms of memory it's pretty inefficent but in terms of speed I've found it works really well since it does nothing but respond to events from the other two threads. It's also been pretty much fool proof. I've made it slightly more efficent by having the thread that runs on the same tile communicate entirely by reference, and the thread on the other tile runs entirely by value.

The one way I was told to get around notifications not working properly was by reordering either the definition or the declaration, but I can't remember which since I started having other problems with communication between tiles and ditched them completely.

As far as I know if you try to have multiple interfaces running between two threads, they will continue to use more channels despite being functionally running to the same place. I imagine it's a token issue... even though issue seems like the wrong word. But with 32 channels per tile hopefully you'd be able to manage it.

Just thought I'd add that without code can't actually say if you're having an issue with your implementation or if you had the same issue I did. If you haven't tried at all I'd say give it a shot, the notification system is really handy if it works for you, and seems like you're in the exact situation it's there for.
User avatar
Automatyk
Member
Posts: 13
Joined: Thu Jul 30, 2015 12:10 pm

Post by Automatyk »

Thank you for your reply Gothmag.

I will keep in mind what you said about your issues with notifications. The idea of using thread between the task sounds good, but unfortunately I wouldn't like to use this soulution im my application, because there will be a couple of threads comunicating in the way I proposed.
Using of two interfaces to communicate between two threads is also something i would like to avoid.
I'm still looking for solution to this ptoblem.

But can anyone tell me if it is possible to make an interface call which would be returning an array?
Something like, the client end of interface call a function which returns an array of data from server.

Thank you for help and sorry for my bad english.

Regards,
Automatyk
User avatar
larry
Respected Member
Posts: 275
Joined: Fri Mar 12, 2010 6:03 pm

Post by larry »

Copying into an array argument would be the standard way of passing an array from server to client:

Code: Select all

interface i {
  void f(char a[]);
};

int main(void)
{
  interface i i;
  par {
    { char x[4] = {1, 2, 3, 4};
      while (1) {
        select {
          case i.f(char a[]):
            memcpy(a, x, sizeof(x));
            break;
        }
      }
    }
    { char a[4];
      i.f(a);
      printf("%d %d %d %d\n", a[0], a[1], a[2], a[3]);
    }
  }
  return 0;
}
User avatar
Automatyk
Member
Posts: 13
Joined: Thu Jul 30, 2015 12:10 pm

Post by Automatyk »

Thank you very much larry :) your answer solved my problem.
I haven't thought this could be working in both ways.

Than you once again larry and Gothmag for providing solutions to my issue.

regards,
Automatyk
zhy44th
Member
Posts: 11
Joined: Mon Dec 18, 2017 9:58 am

Post by zhy44th »

larry wrote:When you send a whole structure, compiler will transfer it in words. In your case only 25 word transfers. You can see it clearly in simulation by looking for output instructions on a channel end:

Code: Select all

$ xsim -t a.xe | grep 'out .*res.*0x......02)' | grep -v switch
tile[0]@0- -DI A-w-.----fff01ba4 (_fdp.bss.large      +ffebc680) : out     res[r0(0x80020202)], r4(0x63626160) @6728
tile[0]@0- -DI A-p-.----fff01ba8 (_fdp.bss.large      +ffebc684) : out     res[r0(0x80020202)], r4(0x5f5e5d5c) @6733
tile[0]@0- -DI A-p-.----fff01bac (_fdp.bss.large      +ffebc688) : out     res[r0(0x80020202)], r4(0x5b5a5958) @6738
tile[0]@0-P-DI A-p-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x57565554) @6743
tile[0]@0- -DI A-a-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x57565554) @6759
tile[0]@0- -DI A-w-.----fff01ba4 (_fdp.bss.large      +ffebc680) : out     res[r0(0x80020202)], r4(0x53525150) @6774
tile[0]@0- -DI A-p-.----fff01ba8 (_fdp.bss.large      +ffebc684) : out     res[r0(0x80020202)], r4(0x4f4e4d4c) @6779
tile[0]@0- -DI A-p-.----fff01bac (_fdp.bss.large      +ffebc688) : out     res[r0(0x80020202)], r4(0x4b4a4948) @6784
tile[0]@0-P-DI A-p-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x47464544) @6789
tile[0]@0- -DI A-a-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x47464544) @6800
tile[0]@0- -DI A-w-.----fff01ba4 (_fdp.bss.large      +ffebc680) : out     res[r0(0x80020202)], r4(0x43424140) @6815
tile[0]@0- -DI A-p-.----fff01ba8 (_fdp.bss.large      +ffebc684) : out     res[r0(0x80020202)], r4(0x3f3e3d3c) @6820
tile[0]@0- -DI A-p-.----fff01bac (_fdp.bss.large      +ffebc688) : out     res[r0(0x80020202)], r4(0x3b3a3938) @6825
tile[0]@0-P-DI A-p-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x37363534) @6830
tile[0]@0- -DI A-a-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x37363534) @6841
tile[0]@0- -DI A-w-.----fff01ba4 (_fdp.bss.large      +ffebc680) : out     res[r0(0x80020202)], r4(0x33323130) @6856
tile[0]@0- -DI A-p-.----fff01ba8 (_fdp.bss.large      +ffebc684) : out     res[r0(0x80020202)], r4(0x2f2e2d2c) @6861
tile[0]@0- -DI A-p-.----fff01bac (_fdp.bss.large      +ffebc688) : out     res[r0(0x80020202)], r4(0x2b2a2928) @6866
tile[0]@0-P-DI A-p-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x27262524) @6871
tile[0]@0- -DI A-a-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x27262524) @6882
tile[0]@0- -DI A-w-.----fff01ba4 (_fdp.bss.large      +ffebc680) : out     res[r0(0x80020202)], r4(0x23222120) @6897
tile[0]@0- -DI A-p-.----fff01ba8 (_fdp.bss.large      +ffebc684) : out     res[r0(0x80020202)], r4(0x1f1e1d1c) @6902
tile[0]@0- -DI A-p-.----fff01bac (_fdp.bss.large      +ffebc688) : out     res[r0(0x80020202)], r4(0x1b1a1918) @6907
tile[0]@0-P-DI A-p-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x17161514) @6912
tile[0]@0- -DI A-a-.----fff01bb0 (_fdp.bss.large      +ffebc68c) : out     res[r0(0x80020202)], r4(0x17161514) @6923
tile[0]@0- -DI A-w-.----fff01ba4 (_fdp.bss.large      +ffebc680) : out     res[r0(0x80020202)], r4(0x13121110) @6938
If you transfer byte by byte explicitly, compiler will leave use OUTT (output byte) instructions instead. In your case 100 byte transfers:

Code: Select all

$ xsim -t a.xe | grep 'outt .*res.*0x......02)' | grep -v switch
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x0) @6702
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x1) @6747
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x2) @6792
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x3) @6837
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x4) @6882
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x5) @6927
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x6) @6972
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x7) @7017
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x8) @7062
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x9) @7107
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0xa) @7152
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0xb) @7197
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0xc) @7242
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0xd) @7287
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0xe) @7332
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0xf) @7377
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x10) @7422
tile[0]@0- -DI A-p-.----00040134 (send                + 40) : outt    res[r1(0x80020202)], r8(0x11) @7467
. . .
I would suggest looking at XC interfaces. When you transfer an array or structure across tiles using an interface, compiler will only copy bytes that will actually be used by the destination. And it will send bytes as words (OUT as above) where possible.
How can I watch these chanend message in XtimeComposer? Thank you.
Post Reply