Ways to share memory

Technical questions regarding the xTIMEcomposer, xSOFTip Explorer and Programming with XMOS.
User avatar
infiniteimprobability
XCore Legend
Posts: 1120
Joined: Thu May 27, 2010 10:08 am

Ways to share memory

Postby infiniteimprobability » Wed Oct 22, 2014 9:29 am

Sharing memory is prohibited by the compiler in normal XC, for good reason (race conditions, memory corruptions, restriction to same tile...). However there are cases where it is very useful, appropriate, safe and fast. At a processor architecture level, access to the memory by different logical cores is time division multiplexed so there is no inherent reason to avoid this in the chip. So there is a need to support this in software.

I know of 4 common techniques for working around the parallel usage rules (there may well be more), whilst sharing memory at high level, so thought I'd share them. Feel free to share your examples too!

Warning: sharing memory can damage your health! Make sure you understand exactly what is going on to avoid latent strange and difficult to solve runtime bugs!

Here's the first method, using unsafe pointers (introduced in tools 13). It's a nice way of sharing a global using a(n unsafe) pointer.

Pros: It's fairly familiar C usage, and is fast (no bounds checking)
Cons: (outside of normal shared memory and pointer type risks) Extra syntax to declare unsafe sections, although TBF it makes it pretty explicit to other readers that something (potentially) dodgy could be going on .

Code: Select all

#include <xs1.h>
#include <stdio.h>
#include <timer.h>

unsigned g_global = 0;

void task1(void){
    volatile unsigned * unsafe glob_ptr;
    unsafe {
      glob_ptr = &g_global;
    }
    unsafe{
        printf("Ptr set to %d\n", *glob_ptr);
        while(*glob_ptr == 0);
        printf("Ptr set to %d\n", *glob_ptr);
    }
}

void task2(void){
    volatile unsigned * unsafe glob_ptr;
    unsafe {
      glob_ptr = &g_global;
    }
    delay_microseconds(1);
    unsafe{
        *glob_ptr = 1234;
    }

}

int main(void){
    par{
        task1();
        task2();
    }
    return 0;
}
themech
Member++
Posts: 17
Joined: Tue Sep 23, 2014 12:17 pm

Postby themech » Tue Oct 28, 2014 2:55 pm

Hi,

thanks for sharing. I used unsafe pointers for a program as well and would have been thankful for this post. I am currently trying to write the same port in different tasks, therefore I need to store the value of the port and get access to it in every task. My idea is to use a variable with a hardware lock. Does anybody has a "best practice" for this use case?
User avatar
davelacey
Experienced Member
Posts: 104
Joined: Fri Dec 11, 2009 8:29 pm

Postby davelacey » Tue Oct 28, 2014 9:54 pm

themech wrote:Hi,

thanks for sharing. I used unsafe pointers for a program as well and would have been thankful for this post. I am currently trying to write the same port in different tasks, therefore I need to store the value of the port and get access to it in every task. My idea is to use a variable with a hardware lock. Does anybody has a "best practice" for this use case?


One way to do this is with "distributable" tasks. You can do something similar to:

Code: Select all

#include <timer.h>
#include <xs1.h>

interface my_port_if {
  // output a value to a specific bit on the port
  void output(unsigned val, unsigned bit);
};

// This task is distributable so will not take a core of its own. It
// will run when called by tasks on other cores.
[[distributable]]
void port_sharer(server interface my_port_if i[n], unsigned n, port p)
{
  unsigned port_val = 0;
  while (1) {
    select {
    // Wait for a client to send an output request
    case i[int j].output(unsigned val, unsigned bit):
      port_val = (port_val & ~(1<<bit)) | val << bit;
      p <: port_val;
      break;
    }
  }
}

void task1(client interface my_port_if i)
{
  // Output 1 to bit 1
  i.output(1, 1);
  delay_milliseconds(2);
  // Output 1 to bit 0
  i.output(1, 0);
}

void task2(client interface my_port_if i)
{
  // Output 1 to bit 2
  i.output(1, 2);
}

port p = XS1_PORT_8A;

int main() {
  // This is interface array allows the tasks to communicate with
  // the port_sharer tasks.
  interface my_port_if i[2];
  par {
    port_sharer(i, 2, p);
    task1(i[0]);
    task2(i[1]);
  }
  return 0;
}


You can change the interface to do any manipulation on the port you want. The port_sharer task will only handle one request at a time - the locks are implicit.
User avatar
infiniteimprobability
XCore Legend
Posts: 1120
Joined: Thu May 27, 2010 10:08 am

Postby infiniteimprobability » Wed Oct 29, 2014 4:26 pm

Hah - my next shared memory example was going to be a distributable task, but as Dave has shown. it;s great for sharing resources too (Actually sharing ports at an instruction level across logical cores without locks is very dangerous as it can cause an exception)..

Distributable tasks are very clean (everything is all nice CSP) and with XC2.0 features means that the distributable task doesn't use an logical core.

When I get a sec I'll finish the shared mem distributable style and post it.

I was then going to move on to C / ASM ways of sharing memory..
User avatar
infiniteimprobability
XCore Legend
Posts: 1120
Joined: Thu May 27, 2010 10:08 am

Postby infiniteimprobability » Thu Oct 30, 2014 3:07 pm

OK, as promised, here is an example of sharing memory using a distributable task. It's probably the neatest and safest way of doing this, because the behaviour is highly explicit in the code. Since the server task (which owns the shared variable) is sequential, the cases are atomic and there are no hidden nasties.
Thanks to the [[distributable]] feature of the compiler, the server task doesn't cost you an extra logical either. The code gets added to each of the client logical cores and the compiler inserts locks to ensure it remains atomic even when distributed..

Pros: Readability and compasability. This must be the cleanest way of writing scalable shared "stuff" handlers. It doesn't cost you an additional core either.
Cons: A bit verbose to write initially, but easier to extend/maintain/read and probably debug. It can be a bit slower as it adds code to the client side calls. Performance is very app dependent however and in some cases may be faster as there is no channel communication.

Here's an example - hope you agree it's very clear..

Code: Select all

#include <stdio.h>
#include <timer.h>
#include <stdlib.h>

interface var_shared_if {
    void set(int val);
    int get(void);
};

void task0(client interface var_shared_if i_shared){
    int val;
    val = i_shared.get();
    printf("Task 0 shared var get=%d\n", val);
    while(val == 0){    //poll every 1us
        delay_microseconds(1);
        val = i_shared.get();
    }
    printf("Task 0 shared var get=%d\n", val);
    _Exit(0);
}

void task1(client interface var_shared_if i_shared){
    int val = 1234;
    printf("Task 1 started\n");
    delay_microseconds(10);
    i_shared.set(val);
    printf("Task 1 shared var set=%d\n", val);
}

[[distributable]] //distribute server task across client logical cores
void buffer(server interface var_shared_if i_shared[2]){
    int val = 0;    //The global value that both tasks can access
    while(1){
        select{ //Note replicated cases - select across each element of the array
            case (int i=0; i<2; i++) i_shared[i].set(int new_val):
            val = new_val;
            break;

            case (int i=0; i<2; i++) i_shared[i].get() -> int ret_val:
            ret_val = val;
            break;
        }
    }
}

int main(void){
    interface var_shared_if i_shared[2];

    par{
        task0(i_shared[0]);
        task1(i_shared[1]);
        buffer(i_shared);    //This will not actually consume a logical core if
                             //Marked as [[distributable]] above
    }
    return 0;
}
User avatar
infiniteimprobability
XCore Legend
Posts: 1120
Joined: Thu May 27, 2010 10:08 am

Postby infiniteimprobability » Tue Nov 04, 2014 12:56 pm

This is the low level way of sharing memory. It uses inline assembly to insert an instruction to load or store a value to/from memory into the registers. There are lots more ways of sharing different types (eg. arrays) here https://github.com/xcore/sc_util/tree/m ... ule_xc_ptr

Pros: Fast! A single line of code.
Cons: It's assembly level, so drops down to the instruction set, however macros can make it more readible. No type checking is done so you can get yourself into trouble easily. Normal shared memory warnings apply.

Code: Select all

#include <xs1.h>
#include <stdio.h>
#include <timer.h>

#define GET_SHARED_GLOBAL(x, g) asm volatile("ldw %0, dp[" #g "]":"=r"(x)::"memory")
#define SET_SHARED_GLOBAL(g, v) asm volatile("stw %0, dp[" #g "]"::"r"(v):"memory")
//see module_xc_ptr in sc_util for these macros and more including shared array access

unsigned g_global = 0;

void task1(void){
    unsigned local_var = 0;
    printf("local_var set to %d\n", local_var);
    while(local_var == 0){
        GET_SHARED_GLOBAL(local_var, g_global);
    }
    printf("local_var set to %d\n", local_var);
}

void task2(void){
    delay_microseconds(1);
    SET_SHARED_GLOBAL(g_global, 7);
}

int main(void){
    par{
        task1();
        task2();
    }
    return 0;
}
User avatar
infiniteimprobability
XCore Legend
Posts: 1120
Joined: Thu May 27, 2010 10:08 am

Postby infiniteimprobability » Fri Nov 07, 2014 6:19 pm

Here's the last one..

Shared memory acces via C (ie. not XC) to avoid compiler checks. This uses a separate C file to do the "dirty stuff" outside of XC. A single set_glob function is shown to allow setting of a global variable. Two versions of get_glob are included - one which returns the value directly and one which sets an argument value, passed by reference, to the global variable value.

Pros: It's all via C, so readable. Type checking is included.
Cons: It requires an extra source file for the shared memory helper functions, because that file must be compiled by a different compiler (C instead of XC). C is less safe than XC in other ways too (no bounds checks on arrays or pointers), which is why this approach works in the first place.

The example:

Code: Select all

#include <xs1.h>
#include <stdio.h>
#include <timer.h>
#include <xccompat.h>

extern void set_global(unsigned write_val);
extern unsigned get_global(void);
extern void get_global_ref(REFERENCE_PARAM(unsigned, read_val));

void task1(void){
    unsigned local_var = 0;
    printf("t1 local_var set to %d\n", local_var);
    while(local_var == 0){
        local_var = get_global();
    }
    printf("t1 local_var set to %d\n", local_var);
}

void task2(void){
    unsigned local_var = 0;
    printf("t2 local_var set to %d\n", local_var);
    while(local_var == 0){
        get_global_ref(local_var);
    }
    printf("t2 local_var set to %d\n", local_var);
}

void task3(void){
    delay_microseconds(1);
    set_global(7);
}

int main(void){
    par{
        task1();
        task2();
        task3();
    }
    return 0;
}


and the C helper function..

Code: Select all

#include <xccompat.h>
unsigned g_global = 0;

void set_global(unsigned write_val){
    g_global = write_val;
}

unsigned get_global(void){
    return g_global;
}

void get_global_ref(REFERENCE_PARAM(unsigned, read_val)){
    *read_val = g_global;
}
hellopossibility
Posts: 1
Joined: Tue Nov 11, 2014 9:12 am

Postby hellopossibility » Tue Nov 11, 2014 9:33 am

Thanks, it is very useful.
And the USB Audio just used the share memory.

Could you please tell me if possible for memory sharing between tiles?
I want to know the possibility about splitting the XUD/buffer/decouple/endpoint cores into 2 tiles.
Thanks.

infiniteimprobability wrote:Sharing memory is prohibited by the compiler in normal XC, for good reason (race conditions, memory corruptions, restriction to same tile...).
User avatar
infiniteimprobability
XCore Legend
Posts: 1120
Joined: Thu May 27, 2010 10:08 am

Postby infiniteimprobability » Tue Nov 11, 2014 12:11 pm

Thanks, it is very useful.

Glad it is useful! Don't forget that in most cases, interfaces/channels are the preferred way of moving data between tasks. This was intended to cover the few cases where this doesn't fit.

And the USB Audio just used the share memory.

Yes - when you are handling a FIFO between tasks (like in USB audio), shared memory makes most sense.

Could you please tell me if possible for memory sharing between tiles?

I'm afraid not. The only physical connections between tiles are the links, power, reset and JTAG. So you have to use channels (either directly or via interfaces).

I want to know the possibility about splitting the XUD/buffer/decouple/endpoint cores into 2 tiles.Thanks.

Not easily - the 4 tasks are closely coupled, so it will require a significant re-write to do this. Decouple and audio can be easily split across tile though.
User avatar
infiniteimprobability
XCore Legend
Posts: 1120
Joined: Thu May 27, 2010 10:08 am

Postby infiniteimprobability » Fri Nov 21, 2014 6:33 pm

This list wouldn't be complete without the movable pointer. See the XMOS programming guide section 5.2.4.5 for details, but in essence, these pointers can be used to transfer ownership between tasks.

This can include ownership of global variables.

Movable pointers are restricted (for your health and safety ;-) ) and make the transferance of ownership explicit through use of the move() operator. This is backed up by runtime checks, which will throw an exception if you try to access it when it's not yours. Nice.

Transferring ownership between tasks running on different cores requires passing a message across channels or interfaces. So by that time, surely you could already have just sent the variable across channels/interface, so no need for a global?

True, however, if the shared memory is a large buffer then this approach makes a lot more sense. Ie replace unsigned g_global = 0; with unsigned g_global[1024]; and it makes a LOT of sense..

Pros: Safe to use and explicit in the code. Runtime checking for additional safety as well..

Cons: A bit slower (runtime checking), and not really that useful for sharing small items since you have to synchronise across channels/interfaces anyhow

Code: Select all

#include <xs1.h>
#include <stdio.h>
#include <timer.h>
#include <stdlib.h>

#define ONE_MICROSECOND 100 //in 100MHz ticks

unsigned g_global = 0;

interface pass_ptr_if{
//    unsigned * movable borrow(unsigned * movable ptr);
    void give(unsigned * movable ptr);
    unsigned * movable take(void);
};

void task1(client interface pass_ptr_if i_pass_ptr){
    unsigned * movable glob_ptr = &g_global;
    printf("Ptr set to %d\n", *glob_ptr);
    while(*glob_ptr == 0){
        i_pass_ptr.give(move(glob_ptr));
        glob_ptr =i_pass_ptr.take();
    }
    printf("Ptr set to %d\n", *glob_ptr);
//    _Exit(0);

}

void task2(server interface pass_ptr_if i_pass_ptr){
    unsigned * movable glob_ptr;

    timer tmr;
    int time_now;
    int have_ptr = 0;   //Flag indicating that this task owns the pointer
    int set_val = 1;    //Flag indicating whether shared var has been set to new value yet

    tmr :> time_now;

    while(1){
        select{

            case i_pass_ptr.give(unsigned * movable ptr):
            glob_ptr = move(ptr);
            printf("give\n");
            have_ptr = 1;
            break;

            case i_pass_ptr.take(void) -> unsigned * movable ptr:
            ptr = move(glob_ptr);
            printf("take\n");
            have_ptr = 0;
            break;

            case (have_ptr && set_val) => tmr when timerafter(time_now + ONE_MICROSECOND) :> int _:
            printf("set\n");
            *glob_ptr = 69;
            set_val = 0;    //Do once only
            break;

        }
    }
}

int main(void){
    interface pass_ptr_if i_pass_ptr;

    par{
        task1(i_pass_ptr);
        task2(i_pass_ptr);
    }
    return 0;
}

Who is online

Users browsing this forum: No registered users and 0 guests