transaction in inline assembly

scrusson · Post by **scrusson** » Mon Jun 04, 2012 3:01 pm

Hi,

I would like to implement a transaction function in inline assembly.
This function would have this prototype :
static inline void trans_out (unsigned c, unsigned addr, unsigned size)
The parameter "c" would be the chanend, "addr" would be the address of the first element to send, and "size" the number of elements to send. I'm new in inline assembly so i've no idea if this is possible to do such thing, in particular the conditional loop.

All kind of ideas will be helpful

Thanks

PS : Of course, i will have to implement the RX function with this prototype:
static inline void trans_in (unsigned c, unsigned addr, unsigned size)
where the "addr" parameter would be in this case the place to store the first element

Gravis · Post by **Gravis** » Mon Jun 04, 2012 3:31 pm

i've yet to find any documentation about how the various XC specific operations work, so i suggest making a microsized demo but instead of outputting a XE binary, output to asm files (.s extension). there is a switch on the compiler to do this, so there is no need to disassemble anything.

oh and you can do anything you can do in XC/C++ you can do in Asm and much more.

segher · Post by **segher** » Mon Jun 04, 2012 3:59 pm

COMPLETELY UNTESTED, NOT EVEN COMPILED

Code: Select all

static inline void trans_out(unsigned c, unsigned addr, unsigned size)
{
        unsigned j, x;

        asm("outct res[%0],1 ; chkct res[%0],1" : : "r"(c));

        for (j = 0; j < size; j++)
                asm("ldw %0,%2[%3] ; out res[%1],%0" : "=&r"(x) : "r"(c), "r"(addr), "r"(j));

        asm("outct res[%0],1 ; chkct res[%0],1" : : "r"(c));
}

You can optimise the loop a bit more by biasing addr, so that there is
no loop test necessary anymore:

Code: Select all

static inline void trans_out(unsigned c, unsigned addr, unsigned size)
{
        unsigned j, x;

        asm("outct res[%0],1 ; chkct res[%0],1" : : "r"(c));

        addr += 4*size;
        for (j = -size; j; j++)
                asm volatile("ldw %0,%2[%3] ; out res[%1],%0" : "=&r"(x) : "r"(c), "r"(addr), "r"(j));

        asm("outct res[%0],1 ; chkct res[%0],1" : : "r"(c));
}

Again, UNTESTED, but this should give you an idea I hope.

scrusson · Post by **scrusson** » Tue Jun 05, 2012 1:37 pm

thank you very much Segher, your code works and helps a lot.

One more question : here is the assembly code generated by your second code :

Code: Select all

0x00010606 <trans_out>:    entsp (u6)      0x5
0x00010608 <trans_out+2>:  stw   (ru6)       r0, sp[0x4]
0x0001060a <trans_out+4>:  stw   (ru6)       r1, sp[0x3]
0x0001060c <trans_out+6>:  stw   (ru6)       r2, sp[0x2]
0x0001060e <trans_out+8>:  ldw   (ru6)       r3, sp[0x4]
0x00010610 <trans_out+10>: outct (rus)     res[r3], 0x1 *
0x00010612 <trans_out+12>: chkct (rus)     res[r3], 0x1 *
0x00010614 <trans_out+14>: ldw   (ru6)       r3, sp[0x2]
0x00010616 <trans_out+16>: ldw   (ru6)       r11, sp[0x3]
0x00010618 <trans_out+18>: ldaw  (l3r)      r3, r11[r3]
0x0001061c <trans_out+22>: stw   (ru6)       r3, sp[0x3]
0x0001061e <trans_out+24>: ldw   (ru6)       r3, sp[0x2]
0x00010620 <trans_out+26>: neg   (2r)        r3, r3
0x00010622 <trans_out+28>: stw   (ru6)       r3, sp[0x0]
0x00010624 <trans_out+30>: bu    (u6)         0x9
0x00010626 <trans_out+32>: ldw   (ru6)       r0, sp[0x0]
0x00010628 <trans_out+34>: ldw   (ru6)       r1, sp[0x3]
0x0001062a <trans_out+36>: ldw   (ru6)       r2, sp[0x4]
0x0001062c <trans_out+38>: ldw   (3r)        r3, r1[r0]
0x0001062e <trans_out+40>: out   (r2r)       res[r2], r3 *
0x00010630 <trans_out+42>: stw   (ru6)       r3, sp[0x1]
0x00010632 <trans_out+44>: ldw   (ru6)       r0, sp[0x0]
0x00010634 <trans_out+46>: add   (2rus)      r0, r0, 0x1
0x00010636 <trans_out+48>: stw   (ru6)       r0, sp[0x0]
0x00010638 <trans_out+50>: ldw   (ru6)       r0, sp[0x0]
0x0001063a <trans_out+52>: bt    (ru6)        r0, -0xb
0x0001063c <trans_out+54>: ldw   (ru6)       r0, sp[0x4]
0x0001063e <trans_out+56>: outct (rus)     res[r0], 0x1 *
0x00010640 <trans_out+58>: chkct (rus)     res[r0], 0x1 *
0x00010642 <trans_out+60>: retsp (u6)      0x5

There are many accesses to the stack, in particular in the loop, is it possible to temporary store the different parameters into registers and then work directly on them, in order to limit the number of instructions?

Again thanks a lot

segher · Post by **segher** » Tue Jun 05, 2012 10:39 pm

scrusson wrote:your code works

Nice! I didn't expect that really :-)

There is a rather big problem with it: nowhere does it describe it reads from
the memory you pass in as "addr", so in theory this whole inline code could
be reordered to before the code that writes to that memory. You probably
won't ever see it do that, but still, be aware.

One more question : here is the assembly code generated by your second code :

Using the C compiler (LLVM), ... and -O0. Uh-oh. Try -O2 :-)

It turns out LLVM de-optimises the loop, it really loves to have all
induction variables start at 0, or something like that. Maybe you
should write the loop part in asm as well.

Compiling it as XC code (with the obvious changes -- get rid of
"asm volatile", that kind of thing) generates much better code; a
more simplistic compiler doesn't help you much with optimising
code for you, but also doesn't get in the way so much!

There are many accesses to the stack,

That's -O0 for you :-)

Some other options if you don't like fighting the compiler: a) write
it as one big asm() block; b) write is as an assembler (.s) file.

rp181 · Post by **rp181** » Wed Jun 06, 2012 3:13 am

How does the second piece of code? You have

Code: Select all

j = -size;

but j is unsigned. Am i missing something?

Also, I never saw/thought of that. Clever :P

segher · Post by **segher** » Wed Jun 06, 2012 3:34 am

"unsigned" in C does not mean "non-negative number". It means the
arithmetic on it is done modulo the word size (2**32 in this case). All
arithmetic on unsigned numbers is well-defined (except dividing by
zero), unlike arithmetic on signed numbers (which is full of undefined
behaviour, which you need to avoid always).

"-x" on unsigned numbers is perfectly well-defined, for all values of x.
Adding x to it will give 0.

This loop transform is pretty standard, and various XMOS example code
uses it. The alternative (which might work better with LLVM, I haven't
tested) is to store the data backwards in memory, and then do

Code: Select all

j = size;
while(j) {
        j--;
        ... do stuff with addr[j] ...;
}

rp181 · Post by **rp181** » Wed Jun 06, 2012 3:57 am

Thanks for the explanation! Going to go read up on undefined behavior - don't know much about C.

Sorry for hijacking this thread.

transaction in inline assembly

transaction in inline assembly

Re: transaction in inline assembly

Re: transaction in inline assembly

Re: transaction in inline assembly

Re: transaction in inline assembly

Re: transaction in inline assembly

Re: transaction in inline assembly

Re: transaction in inline assembly