Cross thread access of registers

Technical questions regarding the XTC tools and programming with XMOS.
TjBordelon
Active Member
Posts: 39
Joined: Mon Jul 29, 2013 4:41 pm

Post by TjBordelon »

Lol. Yes, I know :) You're so precise. I mean exact in that they all start up without unnecessary delay. If you have to signal each in turn they wind up starting a few cycles delayed each.

Here's how I wound up doing it. It still has the same trouble but I think I save resources by using locks.

Note that I live in C now and do evil pointer sharing. So multiple threads have the signal object. Each thread seems to take a full trip through the queue before they each see the signal. Maybe there's room for improvement. But the end result is pretty clean...

Code: Select all

unsigned int signal = 0;

void thread1()
{
   ...
   signal = xcore_createsignal()
   ...
   // Tell the other threads to RUN!
   xcore_signal(signal);
}


void thread2()
{
   ...
  // Wait for mommy to say it's safe.
  xcore_waitsignal(signal)
   ...  
}

void thread3()
{
   ...
  // Wait for mommy to say it's safe.
  xcore_waitsignal(signal)
   ...  
}
and the asm...

Code: Select all


# Creates a signal
xcore_createsignal:
	entsp		0x2
	stw			r1, sp[0x01]
	getr		r0, 0x05							
	in			r1, res[r0]						
	ldw			r1, sp[0x01]
	retsp		0x02


# Signal the signal. Releases waiting threads!
xcore_signal:
	entsp		0x2
	stw			r1, sp[0x01]
	out			res[r0],r1							
	in			r1, res[r0]							
	ldw			r1, sp[0x01]
	retsp		0x02


# Wait on a signal. Other threads call this to block until signaled.
xcore_waitsignal:
	entsp		0x2
	stw			r1, sp[0x01]
	in			r1, res[r0]							
	out			res[r0],r1							
	ldw			r1, sp[0x01]
	retsp		0x02


User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

TjBordelon wrote:Lol. Yes, I know :) You're so precise. I mean exact in that they all start up without unnecessary delay. If you have to signal each in turn they wind up starting a few cycles delayed each.
If you send all a token in order, so just a sequence of OUTs to the
channel ends, all threads can run exactly one cycle (of the "master"
thread) apart (when they actually start running depends on more things).

With your code, you release the lock by the master thread, and then
all the slave threads get the lock and again release it. This takes longer.
Here's how I wound up doing it. It still has the same trouble but I think I save resources by using locks.
Probably. But you shouldn't worry, you're not going to run out anyway.
You first run out of pins, then clocks, then memory, then threads --
well it all depends of course, but my point is that you won't easily run
out of channel ends, even if you use a *lot* of them. There are 32 for
(at most) 8 threads after all...

Some hints on your code...

Code: Select all

xcore_signal:
	entsp		0x2
	stw			r1, sp[0x01]
	out			res[r0],r1							
	in			r1, res[r0]							
	ldw			r1, sp[0x01]
	retsp		0x02
You do not need to save/restore r1, since it is volatile in the ABI.
OUT on a lock ignores the value you output, while IN on a lock returns
the resource ID (i.e. the other parameter); so in both cases you can use
the same register for both. I.e.:

Code: Select all

xcore_signal:
	out res[r0],r0
	in r0,res[r0]
	retsp 0
If you write this as inline asm instead, the compiler can optimise
things better (you lose the call overhead, etc.):

Code: Select all

static inline void xcore_signal(unsigned lock)
{
	asm("out res[%0],%0 ; in %0,res[%0]" : : "r"(lock));
}
This stuff is slightly racy, which might be okay depending what
you use it for. If you *really* want *perfect* synchronisation
(all slave threads become ready at the same time, etc.), there are
hardware synchronisers (see MSYNC, SSYNC). But to use them
you'll have to start the slave threads yourself, which is not so easy...
TjBordelon
Active Member
Posts: 39
Joined: Mon Jul 29, 2013 4:41 pm

Post by TjBordelon »

Again, thanks so much. I'm new to XCORE but learning much faster thanks to your help. Lots of code to fix up with the new insight.