xcore.ai and .xc questions

lpiccinelli · Post by **lpiccinelli** » Thu Nov 26, 2020 10:45 am

Hi everyone,

I have some questions/statements regarding xcore.ai and its programming:

Some notes and questions on your AIoT_sdk. You have a "bug" in the mobilenet example: I got an error on the line "socket bind error" when doing xrun, due to the xscope which is used to send images to the board from the computer. In the same example there is something wrong in the makefile and how the inference_engine is built/linked (added as static library). If I keep the code identical except the the makefile - changed to mostly similar to cifar10 example's one - I got 8ms of inference, otherwise I got 1.5sec. I guess that is due to the inference_engine and how is built in mobilenet example. By the way, your off-the-shelf mobilenet example runs in 3sec, which is quite high. While cifar one in 6ms, which is correct, accordingly to how fast you claim your board is.

It seems that is not possible to spawn the inferenc process to 2 tiles, it just replicates the process, isn't it?

I have understood that the parallelization is not possible to change at runtime (true?). Is there a way to completely stall the cores that are not needed in a certain stage? Therefore, not just using a timer that shifts step-wise the activities in the select{}: from a dummy task - but still active/consuming - to the true activity.

When I try to set the task to just one core (ie: on tile[0].core[0]) it gives me an error like "statement placed on a core must be call to combinable function". Any idea/any suggestion where to look at? (code below)

Thank you in advance for your time.

#include <platform.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <xscope.h>
#include <stdint.h>
#include <timer.h>

extern "C" {
void init_rt();
void infer_rt();
}

void task1() {
    timer t;
    uint32_t time;
    uint8_t flag = 0;
    const uint32_t period = 100000; // 1s?
    t :> time;
    while (1) {
        select {
            case t when timerafter (time) :> void:
                if (flag == 0){
                    init_rt();
                    flag++;
                }
                else infer_rt();
                time += period;
                break;
            default:
                break;
        }
    }
}

int main(void) {
  par {
    on tile[0].core[0]: task1();
  }
  return 0;
}

CousinItt · Post by **CousinItt** » Thu Nov 26, 2020 3:02 pm

The SDK isn't available to the wider community yet, as far as I can tell, so I can't help with your first two questions, but others might.

On the last two points, cores that are waiting on events shouldn't slow down tasks that are running on other cores. This is true of the XS2 architecture and I wouldn't expect the XS3 to deviate from this basic idea.

Can you explain what you mean by "parallelisation is not possible to change at runtime"? There are constraints over sharing resources but there is some flexibility.

It looks like the code you have is combinable (it ends in an infinite loop containing a select statement), but you haven't put [[combinable]] before the function definition. However, you don't need to assign the task to a particular core: on tile[0]: task1(); should work.

lpiccinelli · Post by **lpiccinelli** » Fri Nov 27, 2020 10:10 am

Thank you for your answer, I should explain myself better.

Concerning why I am doing the tile[N].core[M], I would like to do power analysis when using one tile with one,two... cores and two tiles with 8+1, 8+2 cores, etc... . And I would like to do measurements in those different cores/tile usage configurations and while doing different computational tasks: idling state, a FLOP, a matmul and the neural network inference. My question is then: is there a way to do it without the tile[N].core[M]?

My initial idea was: tile[0].core[0] starts doing something (FLOP, matmul,...), while all the others are waiting for a timer to expire (the case in the select statement), such that tile[0].core[1] waits eg 1 sec and then starts, and tile[0].core[2] waits 2 secs to start (1 sec after tile[0].core[1] started) and so on...
Is it possible/necessary to do this way or your framework has a less naive implementation than mine?

By the way, I am doing it because it has been already done with another board and want to compare results.
Thank you again for your time.

CousinItt · Post by **CousinItt** » Fri Nov 27, 2020 1:41 pm

OK I understand a bit better.

>>Is there a way to do it without the tile[N].core[M]? I don't think so. Allocating tasks to cores gives you control; if you don't, then you give the compiler more freedom for allocation. To assign particular cores then you should add [[combinable]] before each task function definition. See section 2.3 in the XMOS programming guide. This doesn't mean the functions have to be combined, it just tells the compiler how to handle things if you allocate them to the same core, and allows it to check that the functions have the correct characteristics (ending in a forever loop containing a select statement)

>>Is it possible/necessary to do this way or your framework has a less naive implementation than mine? If I understand correctly you wish to use overall elapsed time to determine when the processor is busy on certain tasks. I would have thought there are many alternatives, such as having another task to manage the sequencing. This task could start the tasks you wish to test and wait for them to complete via suitable interfaces, and it could indicate each stage externally, e.g. via a test port.

xcore.ai and .xc questions

xcore.ai and .xc questions

Re: xcore.ai and .xc questions

Re: xcore.ai and .xc questions

Re: xcore.ai and .xc questions