Code example for article

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
aclassifier
Respected Member
Posts: 507
Joined: Wed Apr 25, 2012 8:52 pm

Code example for article

Post by aclassifier »

I am writing an article for a magazine called https://www.kode24.no about xC. It is in Norwegian.

I have one code rather long example (350 lines!-] that I thought would cover more than the simplest. Here is the code as of now. I decided to make a rather untypical xC program with 4 workers and one client so show interface and what combine does to allocation. I also added a task that uses a channel.

If anybody has time on a Saturday to look this over I'd be happy. Also if you find some flaws or some good ideas to add even more flavour to it.

I have three things I cannot explain. Three TODOs:
  • Why does why round_cnt_task take a timer?
  • Why round_cnt_task could not be [[[distributable]] as it is (wrong structure, like the error says, even if the text there is slightly wrong?). But this has the structure ok:
  • Why round_cnt_task_2 (not used) could not be [[[distributable]] as it is (because I use a channel?)
]

Code: Select all

/*
 * main.xc
 *
 *  Created on: 12. feb. 2020
 *      Author: teig
 */

#include <platform.h> // core
#include <stdio.h>    // printf
#include <timer.h>    // delay_milliseconds(..), XS1_TIMER_HZ etc
#include "random.h"   // xmos. Also uses "random_conf.h"
#include <iso646.h>   // readability

// -----------------------------------------------------------------------------
// Control printing
// See https://stackoverflow.com/questions/1644868/define-macro-for-debug-printing-in-c
// -----------------------------------------------------------------------------

#define DEBUG_PRINT_TEST 1 // [0->1] code about [5,12] kB
#define debug_print(fmt, ...) do { if(DEBUG_PRINT_TEST) printf(fmt, __VA_ARGS__); } while (0)


// -----------------------------------------------------------------------------
// Define bool
// BOOLEAN #include <stdbool.h> if C99
// See http://www.teigfam.net/oyvind/home/technology/165-xc-code-examples/#bool
// -----------------------------------------------------------------------------

typedef enum {false,true} bool; // 0,1 This typedef matches any integer-type type like long, int, unsigned, char, bool


// -----------------------------------------------------------------------------
// Define type equal to the width of xC "timer". This processor has 10 HW timers,
// but the numbers needed in this code will (with NUM_WORKERS 4) be 2 timers if
// all worker_task run on the same logical core (par [[combine]]) or 5 timers
// if worker_task each have a logical core for themselves.
// Both signed and unsigned int will do, since both will wrap around on
// "overflow" and the hex code will look the same. This way AFTER is well defined
// since adding a value will trigger "timerafter" ticks into the future
// -----------------------------------------------------------------------------
typedef signed int time32_t; // Ticks to 100 in 1 us


// -----------------------------------------------------------------------------
// Define number of workers. Is needed here because variable length arrays
// are not permitted to tasks when they are [[combinable]]
// -----------------------------------------------------------------------------

#define NUM_WORKERS 4


// -----------------------------------------------------------------------------
// Define data typedefs
// -----------------------------------------------------------------------------

typedef unsigned worked_ms_t;

typedef struct log_t {
    unsigned    cnt;
    unsigned    log_started  [NUM_WORKERS];
    unsigned    log_finished [NUM_WORKERS];
    worked_ms_t log_worked_ms[NUM_WORKERS];
    bool        button_pressed;
} log_t;

// -----------------------------------------------------------------------------
// do_print_log
// Prints log if DEBUG_PRINT_TEST is 1. If DEBUG_PRINT_TEST is 0, this function
// is not generated by the compiler
// -----------------------------------------------------------------------------

void do_print_log (
        log_t log,
        unsigned const num_workers) {

    debug_print ("\ncnt %u %s\n", log.cnt, log.button_pressed ? "BUTTON" : "");
    debug_print ("%s", "log.log_started   ");
    for (unsigned ix=0; ix < num_workers; ix++) {
        debug_print ("%2u ", log.log_started[ix]);
    }
    debug_print ("%s", "\nlog.log_worked_ms ");
    for (unsigned ix=0; ix < num_workers; ix++) {
        debug_print ("%2u ", log.log_worked_ms[ix]);
    }
    debug_print ("%s", "\nlog.log_finished  ");
    for (unsigned ix=0; ix < num_workers; ix++) {
        debug_print ("%2u ", log.log_finished[ix]);
    }
    debug_print ("%s", "\n");
}


// -----------------------------------------------------------------------------
// 1 BIT PORT
// External button defined (button press pulls a pullup resistor down)
// -----------------------------------------------------------------------------

in port inP1_button = on tile[0]: XS1_PORT_1M; // External HW GPIO J1 P63 (Board's buttons 4E.0 and 4E.1 could have been used, bit want to show 1-bit port)

#define BUTTON_PRESSED  0
#define BUTTON_RELEASED 1


// -----------------------------------------------------------------------------
// 4 BIT PORT
// Internal LEDs defined. High is "on"
// -----------------------------------------------------------------------------

out buffered port:4 outP4_leds = on tile[0]: XS1_PORT_4F; // 4-bit port. xCORE-200 explorerKIT GPIO J1 7

#define BOARD_LEDS_INIT           0x00
#define BOARD_LED_MASK_GREEN_ONLY 0x01 // BIT0
#define BOARD_LED_MASK_RGB_BLUE   0x02 // BIT1
#define BOARD_LED_MASK_RGB_GREEN  0x04 // BIT2
#define BOARD_LED_MASK_RGB_RED    0x08 // BIT3

#define BOARD_LED_MASK_MAX_1 (BOARD_LED_MASK_GREEN_ONLY)
#define BOARD_LED_MASK_MAX_2 (BOARD_LED_MASK_RGB_BLUE  bitor BOARD_LED_MASK_MAX_1)
#define BOARD_LED_MASK_MAX_3 (BOARD_LED_MASK_RGB_GREEN bitor BOARD_LED_MASK_MAX_2)
#define BOARD_LED_MASK_MAX_4 (BOARD_LED_MASK_RGB_RED   bitor BOARD_LED_MASK_MAX_3)

#define BOARD_LED_MASK_MAX BOARD_LED_MASK_MAX_1

// -----------------------------------------------------------------------------
// do_swipe_leds
// Set LEDs on the xCORE-200 explorerKIT board. There are two, one green only
// and one RGB (with three lines). High is LED on
// -----------------------------------------------------------------------------

void do_swipe_leds (
        out buffered port:4 outP4_leds,
        unsigned &?led_bits, // '&' is reference. Aside: pointer types: no decoration (safe), "movable", "alias" and  "unsafe
        unsigned const board_led_mask_max) {

    if (isnull(led_bits)) { // Just to show a nullable type, shown with '?':
        outP4_leds <: BOARD_LED_MASK_GREEN_ONLY;
    } else {
        outP4_leds <: led_bits; // Output LED bits.

        led_bits++;
        led_bits and_eq board_led_mask_max; // GREEN on and off and 3-coloured RGB LED
    }
}


// -----------------------------------------------------------------------------
// round_cnt_task
// Task that just outputs an incremented value, showing use of a chan
// This takes two chanends and one logical core.
// Plus one timer, for some reason TODO
// -----------------------------------------------------------------------------

// TODO if [[distributable]] error: combinable function must end in a `while(1){select{..}}' or combined `par' statement
void round_cnt_task (chanend c_cnt) { // chans are untyped in xC (but interface is typed++)
    unsigned cnt = 0;
    while (true) {
        cnt++;
        // Synchronous, blocking, no buffer overflow ever possible since there is no buffer:
        c_cnt <: cnt;
    }
}

// TODO if [[distributable]] error: select case in a [[distributable]] function which is not on an interface
void round_cnt_task_2 (chanend c_cnt) { // chans are untyped in xC (but interface is typed++)
    unsigned cnt = 0;
    timer       tmr;
    time32_t    time_ticks; // Ticks to 100 in 1 us

    tmr :> time_ticks;
    while (true) {
        select {
            case tmr when timerafter (time_ticks) :> time_ticks : {
                cnt++;
                // Synchronous, blocking, no buffer overflow ever possible since there is no buffer:
                c_cnt <: cnt;
            } break;
        }
    }
}


// -----------------------------------------------------------------------------
// An interface is implemented by chanends, locks, calls or safe patterns set
// up by the code generation. The particular _transaction_ pattern below enables
// the compiler to set up that particular asynchronous pattern, based on
// synchronous, blocking primitives
// -----------------------------------------------------------------------------

typedef interface worker_if_t {
                            void        async_work_request (void);
    [[notification]] slave  void        finished_work (void);
    [[clears_notification]] worked_ms_t get_work_result (void);
} worker_if_t;

// -----------------------------------------------------------------------------
// worker_task
// NUM_WORKERS of these are started. They may share a logical core when
// par [[combine]] par or run on NUM_WORKERS logical cores if no [[combine]].
// The pattern starts with async_work_request and then simulates work for
// some time, then sends a [[notification]] of finished_work and then the
// clients responds with get_work_result which [[clears_notification]].
// The compiler will insert the correct code to allow only that pattern.
// -----------------------------------------------------------------------------

[[combinable]]
void worker_task (
        server worker_if_t i_worker,
        const unsigned index_of_server) {

    timer       tmr;
    time32_t    time_ticks; // Ticks to 100 in 1 us
    bool        doCollectData = false;
    worked_ms_t sim_work_ms = 0;
    unsigned    random_seed = random_create_generator_from_seed(index_of_server); // xmos
    unsigned    random_work_delay_ms;

    debug_print ("worker_task %u\n", index_of_server);

    while (1) {
        select {
            case i_worker.async_work_request () : {
                doCollectData = true;
                random_work_delay_ms = random_get_random_number (random_seed) % 100; // [0..99]
                sim_work_ms = random_work_delay_ms;
                tmr :> time_ticks; // Immediately
                time_ticks += (sim_work_ms * XS1_TIMER_KHZ); // Simulate work
            } break;
            case (doCollectData == true) => tmr when timerafter (time_ticks) :> void : {
                // Now we have simulated that picking up log.log_worked_ms took random_work_delay_ms
                doCollectData = false;
                i_worker.finished_work();
            } break;
            case i_worker.get_work_result (void) -> worked_ms_t worked_ms : {
                worked_ms = sim_work_ms;
            } break;
        }
    }
}


// -----------------------------------------------------------------------------
// client_task
// Asks for work from NUM_WORKERS worker_task (service requested
// in different sequences) and results from workers, when they arrive, handled.
// Each interface call is blocking and synchronous, but the net result of the
// pattern is asynchronous worker_task assignments.
// Log, a button and LEDs handled.
// -----------------------------------------------------------------------------

[[combinable]]
void client_task (
        client worker_if_t i_worker[NUM_WORKERS],
        in port inP1_button,
        out buffered port:4 outP4_leds,
        chanend c_cnt) {

    timer    tmr;
    time32_t time_ticks; // Ticks to 100 in 1 us
    bool     expect_notification_nums = 0;
    unsigned random_seed = random_create_generator_from_seed(1); // xmos. Pseudorandom, so will look the same on and after each start-up
    unsigned random_number;
    log_t    log;
    bool     allow_button = false;
    bool     button_current_val = BUTTON_RELEASED;
    unsigned led_bits; // Init below..

    led_bits = BOARD_LEDS_INIT; // ..here to avoid "not used" if "null" used instead
    log.cnt = 0;
    log.button_pressed = false;

    debug_print ("%s", "client_task\n");

    tmr :> time_ticks;
    time_ticks += (1 * XS1_TIMER_HZ); // 1 second before first timerafter

    while (1) {
        select {
            case (expect_notification_nums == 0) => tmr when timerafter (time_ticks) :> void : {
                random_number = random_get_random_number (random_seed); // Just trying to start randomly

                // Start as [0,1,2,3], [3,0,1,2], [2,3,0,1], [1,2,3,0]:
                for (unsigned ix=0; ix < NUM_WORKERS; ix++) {
                    unsigned random_worker = random_number % NUM_WORKERS; // Inside [0..(NUM_WORKERS-1)]
                    i_worker[random_worker].async_work_request(); // Now log.log_started in random sequence
                    random_number++; // Next (but modulo NUM_WORKERS above)

                    log.log_started[ix] = random_worker;
                }
                expect_notification_nums = NUM_WORKERS;
                // === Do something else while all worker_task work ===
            } break;
            case (expect_notification_nums > 0) => i_worker[unsigned index_of_server].finished_work() : {

                // Server async_work_request entries are protected by code and scheduler until this is run:
                log.log_worked_ms[index_of_server] = i_worker[index_of_server].get_work_result();
                // async_work_request is not allowed again before the above line is run, by compiler and code

                expect_notification_nums--;

                log.log_finished[expect_notification_nums] = index_of_server;
                if (expect_notification_nums == 0) {
                    select { // Nested select
                        case c_cnt :> log.cnt: {} break;
                    }
                    do_print_log (log, NUM_WORKERS); // Only if DEBUG_PRINT_TEST is 1
                    do_swipe_leds (outP4_leds, led_bits, BOARD_LED_MASK_MAX); // led_bits may be "null"
                    // === Process received log.log_worked_ms, or just.. ===
                    tmr :> time_ticks; // ..repeat immediately
                    allow_button = (log.cnt >= 10);

                } else {}
            } break;
            case allow_button => inP1_button when pinsneq(button_current_val) :> button_current_val: {
                // I/O pin changed value
                // Debouncing not done (best done in separate task, with its own timerafter)
                log.button_pressed = (button_current_val == BUTTON_PRESSED); // May not reach do_print_log
            } break;
        }
    }
}



// -----------------------------------------------------------------------------
// main
// Starts 1+NUM_WORKERS tasks, running on 2 or 1+NUM_WORKERS logical cores
// -----------------------------------------------------------------------------

int main() {
    worker_if_t i_worker[NUM_WORKERS];
    chan c_cnt;
    par {
        [[combine]] // NUM_WORKERS(4) = [cores,timers,chanends]->[3,3,11], if no [[combine]] then ->[6,6,11]
        par (int ix = 0; ix < NUM_WORKERS; ix++) {
            worker_task (i_worker[ix], ix);
        }
        client_task (i_worker, inP1_button, outP4_leds, c_cnt);
        round_cnt_task (c_cnt);
    }
    return 0;
}
A prinout looks like this:

Code: Select all

client_task
worker_task 0
worker_task 1
worker_task 2
worker_task 3

cnt 1 
log.log_started    2  3  0  1 
log.log_worked_ms 92 78 69 59 
log.log_finished   0  1  2  3 

cnt 2 
log.log_started    1  2  3  0 
log.log_worked_ms 18 25 77 58 
log.log_finished   2  3  1  0 

cnt 3 
log.log_started    3  0  1  2 
log.log_worked_ms 66 15 61 96 
log.log_finished   3  0  2  1 

cnt 4 
log.log_started    2  3  0  1 
log.log_worked_ms 56 86 16 30 
log.log_finished   1  0  3  2 
and the build log:

Code: Select all

10:38:37 **** Incremental Build of configuration Default for project _xmos_issues_xcore200 ****
xmake CONFIG=Default all 
Checking build modules
Using build modules: module_random
Analyzing main.xc
Creating dependencies for main.xc
Compiling main.xc
Rebuild .build/_obj.rsp
Creating xmos_issues_xcore200.xe
Constraint check for tile[0]:
  Cores available:            8,   used:          3 .  OKAY
  Timers available:          10,   used:          3 .  OKAY
  Chanends available:        32,   used:         11 .  OKAY
  Memory available:       262144,   used:      12208 .  OKAY
    (Stack: 1724, Code: 9482, Data: 1002)
Constraints checks PASSED.
Build Complete
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

ad 1) "round_cnt_task" does not acquire a timer. It is "round_cnt_task_2", "worker_task", "client_task". Each function requires one timer. Therefore you get 3 timers acquired.
ad 2) Two issues: First, you need a "select { ...}" - statement inside the while loop. Second, "distributable" can only be used within an interface (https://www.xmos.com/file/how-define-an ... -function/).
ad 3) You already may guess the right answer: "distributable" can only be used within an interface.

If you use an distributable interface only on one tile, the interface call behaves like normal synchronized function call, that does not need no futher core. I guess, "distributable" is a compiler internal optimization to improve performance and reduce resource requirements, if the condition with the single core access is true. If the interface is used from a different core, then the "normal synchronized function call" optimization get lost.The synchronization prevents, that a race-condition can occur.
User avatar
aclassifier
Respected Member
Posts: 507
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

Thanks a lot, @dsteinwe! After so many months!

ad 1) The problem is that if I compile with round_cnt_task or round_cnt_task_2 then I get the same number of timers used: 3. I think timers may be reused since it's normally not enough to count up the number of timer statements(?)

ad 2) Thanks for the XMOS document reference! I've added it to XC is C plus X (as ref [23]) and also commented it there (Disclaimer)). You are right, a while loop with no select with nothing to hang is not much callable as an in-line function.This is so clear to me now!

ad 3) I knew that it could only be used as mapping an interface only (no timer, no port I/O). But good to have it clarified that in round_cnt_task_2 it is indeed the chan usage that stops me. I assume it's because with interface the compiler is more assured about functionality, whereas with chanend I can even use it in both directions. And analyse which usage of chan that would allow for [[distributable]] XMOS probably decided not to do. Just speculation here, though..

I have not investigated what happens if a [[distributable]] task is used by two tasks, one on each core. I would assume that there would be one copy on each core (?)
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

ad 1) Well, I have tested it on my board and I have to realize that I get the same results. Then I have run "round_cnt_task" explicitly on a different tile: even then, a time is used. Then I have removed mostly all code from main and even then a timer is used. It seems that a timer resource is always acquired if a tile is in use. Why? I can not explain. You may have to analyse the binary's assembler code to understand.
I have not investigated what happens if a [[distributable]] task is used by two tasks, one on each core. I would assume that there would be one copy on each core (?)
I have test it with the i2c lib some time ago: The server function "i2c_master" is annotated with "[[distributable]]". If the "i2c_master" interface is used from a different tile, then "i2c_master" acquires a core and behaves like a regular task (without the "distributable" modifier).
User avatar
aclassifier
Respected Member
Posts: 507
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

ad 1) So we have something to have explained for that case!

I am using the lib_i2c quite a lot. Interesting to learn about your observation! I think it's perhaps the xmapper that selects from all possible layouts of the code (normal, combinable and distributable) and builds the final code.

I have been missing a layout log from the build. What code is where?
Off Topic
I have tried to write about this here
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
Lorien
Active Member
Posts: 33
Joined: Wed May 19, 2010 9:07 am

Post by Lorien »

@ aclassifier: looking to learn XMOS processors after more than 9 years of using them and not understand (too much) of their code! Tumbling around this forum I found this opportunity to thank you for your work in making your website! I'm "devouring" it at the moment... with my small mouth hoping to catch up some of my 'lost' time!
User avatar
aclassifier
Respected Member
Posts: 507
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

@Lorien! Thank you so much! One of those per year makes that year! Thanks, again! And for the fact that you actually write it out!

(And then, there may be are errors or misunderstandings there. I hope commenting isn't too hard, If it is just mail me)
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

@Øyvind: On searching some documentations about locks, I have accidentally found a detail about distributed tasks. Since I cannot assess whether this is known to you, I quote text passage from the 'XMOS Programming Guide (XM004440A)', P. 36:
2.3.2 Distributable functions

...

This implementation requires the core of the client task to have direct access to the
state of the distributed task so only works when both are on the same tile. If the
tasks are connected across tiles then the distributed task will act as a normal task
(though it is still a combinable function so could share a core with other tasks).

If a distributed task is connected to several tasks (on the same tile <= added by me to clarify), they cannot safely change its
state concurrently. In this case the compiler implicitly uses a lock to protect the
state of the task.
User avatar
aclassifier
Respected Member
Posts: 507
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

Thanks, @dsteinwe. What I know something or not know is harder and harder to know myself

But yes, I knew about the locks inserted by the compiler. In my blog [1] I have referred to [2] here. I have also mentioned it in [3]. Plus, your quote is in [4].

The XMOS datasheet of the core on the ExplorerKIT [5] also mentions that there are "8 locks (4 per tile)". I dont' know if those HW locks are those used to make critical sections of [[distributed]] atomic by using "locks" there. But I assume they would be. Locks are locks?

This is somewhat close to the core of a problem I have at the moment. I am making two PWMs controlling two LED strips [6] and I some times want to synchronize them. Not the PWMs, but the tasks controlling how the PWMs take the intensity up and down. I have a blocking solution with channels but I have no successful solution (where I have experimented with not using [[notification]] and [[clears notification]].) My present solution crashes if [[combinable]] but just stops (deadlocks) when not [[combined]]. I will come back with the two solutions in [6] and also raise some questions in a new thread. I am implementing a barrier task with channels or interfaces. So, if anybody has implemented a working barrier solution, please start a new thread with it (but tell about that thread here, since I'm not daily here).

REFS

[1] https://www.teigfam.net/oyvind/home/tec ... ing_memory

[2] https://www.xcore.com/viewtopic.php?f=26&t=3061 by infiniteimprobability » Thu Oct 30, 2014 4:07 pm

[3] https://www.teigfam.net/oyvind/home/tec ... tributable

[4] https://www.teigfam.net/oyvind/home/tec ... nd_numbers

[5] https://www.xmos.com/download/XEF216-51 ... (1.15).pdf

[6] https://www.teigfam.net/oyvind/home/tec ... otes/#next
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/