transaction in inline assembly

Technical questions regarding the XTC tools and programming with XMOS.
Post Reply
scrusson
Member
Posts: 9
Joined: Wed Mar 21, 2012 10:33 am

transaction in inline assembly

Post by scrusson »

Hi,

I would like to implement a transaction function in inline assembly.
This function would have this prototype :
static inline void trans_out (unsigned c, unsigned addr, unsigned size)
The parameter "c" would be the chanend, "addr" would be the address of the first element to send, and "size" the number of elements to send. I'm new in inline assembly so i've no idea if this is possible to do such thing, in particular the conditional loop.

All kind of ideas will be helpful

Thanks

PS : Of course, i will have to implement the RX function with this prototype:
static inline void trans_in (unsigned c, unsigned addr, unsigned size)
where the "addr" parameter would be in this case the place to store the first element


User avatar
Gravis
Experienced Member
Posts: 75
Joined: Thu Feb 02, 2012 3:32 pm

Post by Gravis »

i've yet to find any documentation about how the various XC specific operations work, so i suggest making a microsized demo but instead of outputting a XE binary, output to asm files (.s extension). there is a switch on the compiler to do this, so there is no need to disassemble anything.

oh and you can do anything you can do in XC/C++ you can do in Asm and much more.
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am
Contact:

Post by segher »

COMPLETELY UNTESTED, NOT EVEN COMPILED

Code: Select all

static inline void trans_out(unsigned c, unsigned addr, unsigned size)
{
        unsigned j, x;

        asm("outct res[%0],1 ; chkct res[%0],1" : : "r"(c));

        for (j = 0; j < size; j++)
                asm("ldw %0,%2[%3] ; out res[%1],%0" : "=&r"(x) : "r"(c), "r"(addr), "r"(j));

        asm("outct res[%0],1 ; chkct res[%0],1" : : "r"(c));
}
You can optimise the loop a bit more by biasing addr, so that there is
no loop test necessary anymore:

Code: Select all

static inline void trans_out(unsigned c, unsigned addr, unsigned size)
{
        unsigned j, x;

        asm("outct res[%0],1 ; chkct res[%0],1" : : "r"(c));

        addr += 4*size;
        for (j = -size; j; j++)
                asm volatile("ldw %0,%2[%3] ; out res[%1],%0" : "=&r"(x) : "r"(c), "r"(addr), "r"(j));

        asm("outct res[%0],1 ; chkct res[%0],1" : : "r"(c));
}
Again, UNTESTED, but this should give you an idea I hope.
scrusson
Member
Posts: 9
Joined: Wed Mar 21, 2012 10:33 am

Post by scrusson »

thank you very much Segher, your code works and helps a lot.

One more question : here is the assembly code generated by your second code :

Code: Select all

0x00010606 <trans_out>:    entsp (u6)      0x5
0x00010608 <trans_out+2>:  stw   (ru6)       r0, sp[0x4]
0x0001060a <trans_out+4>:  stw   (ru6)       r1, sp[0x3]
0x0001060c <trans_out+6>:  stw   (ru6)       r2, sp[0x2]
0x0001060e <trans_out+8>:  ldw   (ru6)       r3, sp[0x4]
0x00010610 <trans_out+10>: outct (rus)     res[r3], 0x1 *
0x00010612 <trans_out+12>: chkct (rus)     res[r3], 0x1 *
0x00010614 <trans_out+14>: ldw   (ru6)       r3, sp[0x2]
0x00010616 <trans_out+16>: ldw   (ru6)       r11, sp[0x3]
0x00010618 <trans_out+18>: ldaw  (l3r)      r3, r11[r3]
0x0001061c <trans_out+22>: stw   (ru6)       r3, sp[0x3]
0x0001061e <trans_out+24>: ldw   (ru6)       r3, sp[0x2]
0x00010620 <trans_out+26>: neg   (2r)        r3, r3
0x00010622 <trans_out+28>: stw   (ru6)       r3, sp[0x0]
0x00010624 <trans_out+30>: bu    (u6)         0x9
0x00010626 <trans_out+32>: ldw   (ru6)       r0, sp[0x0]
0x00010628 <trans_out+34>: ldw   (ru6)       r1, sp[0x3]
0x0001062a <trans_out+36>: ldw   (ru6)       r2, sp[0x4]
0x0001062c <trans_out+38>: ldw   (3r)        r3, r1[r0]
0x0001062e <trans_out+40>: out   (r2r)       res[r2], r3 *
0x00010630 <trans_out+42>: stw   (ru6)       r3, sp[0x1]
0x00010632 <trans_out+44>: ldw   (ru6)       r0, sp[0x0]
0x00010634 <trans_out+46>: add   (2rus)      r0, r0, 0x1
0x00010636 <trans_out+48>: stw   (ru6)       r0, sp[0x0]
0x00010638 <trans_out+50>: ldw   (ru6)       r0, sp[0x0]
0x0001063a <trans_out+52>: bt    (ru6)        r0, -0xb
0x0001063c <trans_out+54>: ldw   (ru6)       r0, sp[0x4]
0x0001063e <trans_out+56>: outct (rus)     res[r0], 0x1 *
0x00010640 <trans_out+58>: chkct (rus)     res[r0], 0x1 *
0x00010642 <trans_out+60>: retsp (u6)      0x5
There are many accesses to the stack, in particular in the loop, is it possible to temporary store the different parameters into registers and then work directly on them, in order to limit the number of instructions?

Again thanks a lot
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am
Contact:

Post by segher »

scrusson wrote:your code works
Nice! I didn't expect that really :-)

There is a rather big problem with it: nowhere does it describe it reads from
the memory you pass in as "addr", so in theory this whole inline code could
be reordered to before the code that writes to that memory. You probably
won't ever see it do that, but still, be aware.
One more question : here is the assembly code generated by your second code :
Using the C compiler (LLVM), ... and -O0. Uh-oh. Try -O2 :-)

It turns out LLVM de-optimises the loop, it really loves to have all
induction variables start at 0, or something like that. Maybe you
should write the loop part in asm as well.

Compiling it as XC code (with the obvious changes -- get rid of
"asm volatile", that kind of thing) generates much better code; a
more simplistic compiler doesn't help you much with optimising
code for you, but also doesn't get in the way so much!
There are many accesses to the stack,
That's -O0 for you :-)

Some other options if you don't like fighting the compiler: a) write
it as one big asm() block; b) write is as an assembler (.s) file.
User avatar
rp181
Respected Member
Posts: 395
Joined: Tue May 18, 2010 12:25 am
Contact:

Post by rp181 »

How does the second piece of code? You have

Code: Select all

j = -size;
but j is unsigned. Am i missing something?

Also, I never saw/thought of that. Clever :P
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am
Contact:

Post by segher »

"unsigned" in C does not mean "non-negative number". It means the
arithmetic on it is done modulo the word size (2**32 in this case). All
arithmetic on unsigned numbers is well-defined (except dividing by
zero), unlike arithmetic on signed numbers (which is full of undefined
behaviour, which you need to avoid always).

"-x" on unsigned numbers is perfectly well-defined, for all values of x.
Adding x to it will give 0.

This loop transform is pretty standard, and various XMOS example code
uses it. The alternative (which might work better with LLVM, I haven't
tested) is to store the data backwards in memory, and then do

Code: Select all

j = size;
while(j) {
        j--;
        ... do stuff with addr[j] ...;
}
User avatar
rp181
Respected Member
Posts: 395
Joined: Tue May 18, 2010 12:25 am
Contact:

Post by rp181 »

Thanks for the explanation! Going to go read up on undefined behavior - don't know much about C.

Sorry for hijacking this thread.
Post Reply