Hello,
I could not find an easy example or explanation on how to create tasks and synchronize their behavior, appart an assembly source code in Github tensor flow :
https://github.com/xmos/lib_tflite_micr ... all.S#L615
According to the Architecture, the MSYNC and SSYNC instruction are used to do so by referring to a "synchronizer" which is allocated when a master tasks is created. But how do we get access to the synchronizer created by the "par" instruction...
googling a bit, I found a patent which is probably linked to xmos architecture about this:
https://patentimages.storage.googleapis ... 7169A1.pdf
but reading head will give you a big headache !
My use case is for the USB Audio app. on tile 0 (xu216) we have the audio-i2s and 1 master dsp task which is triggered by the audio-i2s when a new sample is received from decouple. Then I d like this "master" dsp task to start also 3 other "slave" dsp tasks at the same time.
I guess I need to create the 4 dsp task and in the 3 slaves just do SSYNC.
then in the master one just use MSYNC synchronizer
but not clear and how to access this dam hell synchronizer.
Thanks in advance.
Synchronizing tasks with MSYNC SSYNC
-
- Respected Member
- Posts: 279
- Joined: Mon Jan 08, 2018 4:14 pm
Synchronizing tasks with MSYNC SSYNC
Last edited by fabriceo on Sat May 11, 2024 1:03 pm, edited 1 time in total.
-
- Respected Member
- Posts: 279
- Joined: Mon Jan 08, 2018 4:14 pm
Okay, I ve got the solution, after doing some reverse engineering in the assembly code.
assuming such declaration:
the compiler will create a "par descriptor" containing the address of a() and its stack size, the address of b() and its stack size and then a 0 followed by the address of c(), which will be considered as master of a() and b().
then at the beginning of task1, it calls a library function:
investigating what s happen in this __start_other_cores is not too difficult:
a synchronizer is requested and stored in the register r5
then the 2 tasks ( a() and b() ) are created against this synchronizer.
then the instruction msync res[r5] is called, which starts a() and b()
and immediately after, c() is called.
The trick is just to retrieve r5 immediately at the beginning of c() and we can play ourselves.
Here is a full example working:
obviously the reader should double check that the library used by its XTC version is aligned with this approach. This example runs perfect with XTC 14.4.1 and was tested on xu216 with compilation flag -o1
hope this helps, at least it solves an old topic:
https://www.xcore.com/viewtopic.php?t=7999
cheers
assuming such declaration:
Code: Select all
void task1(){
par {
a(); //slave tasks
b(); //slave tasks
c(); //master task with synchronizer in R5
}
printf("all finished\n");
}
int main(){
task1();
return 0;
}
then at the beginning of task1, it calls a library function:
Code: Select all
ldaw r1, dp[par.desc.1]
ldc r0, 0
bl __start_other_cores
a synchronizer is requested and stored in the register r5
then the 2 tasks ( a() and b() ) are created against this synchronizer.
then the instruction msync res[r5] is called, which starts a() and b()
and immediately after, c() is called.
The trick is just to retrieve r5 immediately at the beginning of c() and we can play ourselves.
Here is a full example working:
Code: Select all
#include <platform.h>
#include <stdio.h>
#ifdef XSCOPE
#include <xscope.h>
void xscope_user_init()
{ xscope_register(0, 0, "", 0, "");
xscope_config_io(XSCOPE_IO_BASIC); } // or XSCOPE_IO_TIMED
#endif
void a(){
printf("a before sync\n");
asm volatile("ssync":::"memory");
printf("a after sync\n");
}
void b(){
printf("b before sync\n");
asm volatile("ssync":::"memory");
printf("b after sync\n");
}
void c(){
int sync;
asm volatile("mov %0,r5 ":"=r"(sync));
printf("c sync = 0x%x\n",sync);
delay_ticks(10000);
printf("c after delay 100us\n");
asm volatile("msync res[%0]"::"r"(sync));
delay_ticks(10000);
printf("c after delay 100us and msync\n",sync);
}
void task1(){
par {
a(); //slave tasks
b(); //slave tasks
c(); //master task owning synchronizer
}
printf("all finished\n");
}
int main(){
task1();
return 0;
}
hope this helps, at least it solves an old topic:
https://www.xcore.com/viewtopic.php?t=7999
cheers
Last edited by fabriceo on Sat May 11, 2024 1:06 pm, edited 2 times in total.
-
- Respected Member
- Posts: 279
- Joined: Mon Jan 08, 2018 4:14 pm
well, it is important to use the "memory" barrier with ssync, otherwise the compiler optimization is reshuffling instructions and potentially moving some instruction above ssync :) (tested!)
-
- Respected Member
- Posts: 279
- Joined: Mon Jan 08, 2018 4:14 pm
also, again due to compiler optimization, saving the register R5 has to be done at the very beginning and sometime the compiler reschedule some instructions before, thus loosing the original value of r5...
one way to solve it for me was to declare the synchronizer as a global variable and then to combine the following assembly with the master task inside the par statement like this:
one way to solve it for me was to declare the synchronizer as a global variable and then to combine the following assembly with the master task inside the par statement like this:
Code: Select all
int dspSynchronizer;
void task1(){
par {
a(); //slave tasks
b(); //slave tasks
{ asm volatile("stw r5,dp[dspSynchronizer]":::"memory");
c(); } //master task owning synchronizer
}
printf("all finished\n");
}
-
Verified
- XCore Legend
- Posts: 1307
- Joined: Thu Dec 10, 2009 9:20 pm
- Location: Bristol, UK
This is a some nice hacking, is the sync always stored in r5? I've not checked.
Technical Director @ XMOS. Opinions expressed are my own
-
- Respected Member
- Posts: 279
- Joined: Mon Jan 08, 2018 4:14 pm
Hi Ross
yes, in the assembly code of this library function "__start_other_cores"
r5 is always the synchronizer, no problem. The only problem is be careful with the prologue of the master task to keep r5 at its original value.
after careful attention of what does the compiler in -O3, I m now utilizing this in a commercial product.
you might provide a feedback to the dev team so that they propose a way to make things more convenient.
thanks
yes, in the assembly code of this library function "__start_other_cores"
r5 is always the synchronizer, no problem. The only problem is be careful with the prologue of the master task to keep r5 at its original value.
after careful attention of what does the compiler in -O3, I m now utilizing this in a commercial product.
you might provide a feedback to the dev team so that they propose a way to make things more convenient.
thanks
-
- Respected Member
- Posts: 279
- Joined: Mon Jan 08, 2018 4:14 pm
Hi, after many experimentation, I realized that using the MSYNC/SSYNC is a great way to launch multiple core at the exact same time with a single instruction. So in my case for DSP functionalities in the USB Audio application, I launch MSYNC in the audiohub just after the update of LRCL pin.
But be aware of a tricky side effect which cost me few days of headache:
If for some reason one or many of your slave tasks are required to exit from their main while loop, (for example due to a change of the program or sampling frequency) then the control will be given back to the library function called __start_core and this code will just issue a SSYNC instruction, awaiting the master task to send an MJOIN.
But if you do not leave your master task and if you continue to issue some MSYNC for the remaining tasks alive, you end up in a fatal error because there is no any valid code after SSYNC in the __start_core ! xmos should have put a while(1) ssync;
so if you want to exit some of the slave tasks, just loop on a SSYNC instruction and organize so that the master send an MJOIN in order to close them ALL and restart only the one you need.
that cannot be easy to understand until you will have this fatal error, but then this post will save you some days :)
fabrico
But be aware of a tricky side effect which cost me few days of headache:
If for some reason one or many of your slave tasks are required to exit from their main while loop, (for example due to a change of the program or sampling frequency) then the control will be given back to the library function called __start_core and this code will just issue a SSYNC instruction, awaiting the master task to send an MJOIN.
But if you do not leave your master task and if you continue to issue some MSYNC for the remaining tasks alive, you end up in a fatal error because there is no any valid code after SSYNC in the __start_core ! xmos should have put a while(1) ssync;
so if you want to exit some of the slave tasks, just loop on a SSYNC instruction and organize so that the master send an MJOIN in order to close them ALL and restart only the one you need.
that cannot be easy to understand until you will have this fatal error, but then this post will save you some days :)
fabrico
-
- Respected Member
- Posts: 279
- Joined: Mon Jan 08, 2018 4:14 pm
Hi guys, just un update for those willing to use MSYNC/SSYNC/MJOIN to synchronise many tasks declared in a single par { }.
as explained above, the solution to get the synchroniser used by the par { } underlying library (see the routine __start_other_cores in the assembly file generated) is to use a "stw r5" instruction at the very beginning of the last task.
I m now devlopping on xu316 wit xtc15.3.1, and I confirm that r5 is still the good register.
but I realised that the compiler could overwrite the r5 register in the "prologue" of the task code, just after saving registers on the stack.
the solution to avoid this is to add r5 in the "clobber" section of the asm statement. this way the compiler uses some other registers.
so from the previous example given in the first post this becomes:
hope this helps
fabriceo
as explained above, the solution to get the synchroniser used by the par { } underlying library (see the routine __start_other_cores in the assembly file generated) is to use a "stw r5" instruction at the very beginning of the last task.
I m now devlopping on xu316 wit xtc15.3.1, and I confirm that r5 is still the good register.
but I realised that the compiler could overwrite the r5 register in the "prologue" of the task code, just after saving registers on the stack.
the solution to avoid this is to add r5 in the "clobber" section of the asm statement. this way the compiler uses some other registers.
so from the previous example given in the first post this becomes:
Code: Select all
int dspSynchronizer;
void task1(){
par {
a(); //slave tasks
b(); //slave tasks
{ asm volatile("stw r5,dp[dspSynchronizer]":::"memory","r5");
c(); } //master task owning synchronizer
}
printf("all finished\n");
}
fabriceo