Hi,
I would like to implement a transaction function in inline assembly.
This function would have this prototype :
static inline void trans_out (unsigned c, unsigned addr, unsigned size)
The parameter "c" would be the chanend, "addr" would be the address of the first element to send, and "size" the number of elements to send. I'm new in inline assembly so i've no idea if this is possible to do such thing, in particular the conditional loop.
All kind of ideas will be helpful
Thanks
PS : Of course, i will have to implement the RX function with this prototype:
static inline void trans_in (unsigned c, unsigned addr, unsigned size)
where the "addr" parameter would be in this case the place to store the first element
transaction in inline assembly
-
- Member
- Posts: 9
- Joined: Wed Mar 21, 2012 10:33 am
-
- Experienced Member
- Posts: 75
- Joined: Thu Feb 02, 2012 3:32 pm
i've yet to find any documentation about how the various XC specific operations work, so i suggest making a microsized demo but instead of outputting a XE binary, output to asm files (.s extension). there is a switch on the compiler to do this, so there is no need to disassemble anything.
oh and you can do anything you can do in XC/C++ you can do in Asm and much more.
oh and you can do anything you can do in XC/C++ you can do in Asm and much more.
-
- XCore Expert
- Posts: 844
- Joined: Sun Jul 11, 2010 1:31 am
COMPLETELY UNTESTED, NOT EVEN COMPILED
You can optimise the loop a bit more by biasing addr, so that there is
no loop test necessary anymore:
Again, UNTESTED, but this should give you an idea I hope.
Code: Select all
static inline void trans_out(unsigned c, unsigned addr, unsigned size)
{
unsigned j, x;
asm("outct res[%0],1 ; chkct res[%0],1" : : "r"(c));
for (j = 0; j < size; j++)
asm("ldw %0,%2[%3] ; out res[%1],%0" : "=&r"(x) : "r"(c), "r"(addr), "r"(j));
asm("outct res[%0],1 ; chkct res[%0],1" : : "r"(c));
}
no loop test necessary anymore:
Code: Select all
static inline void trans_out(unsigned c, unsigned addr, unsigned size)
{
unsigned j, x;
asm("outct res[%0],1 ; chkct res[%0],1" : : "r"(c));
addr += 4*size;
for (j = -size; j; j++)
asm volatile("ldw %0,%2[%3] ; out res[%1],%0" : "=&r"(x) : "r"(c), "r"(addr), "r"(j));
asm("outct res[%0],1 ; chkct res[%0],1" : : "r"(c));
}
-
- Member
- Posts: 9
- Joined: Wed Mar 21, 2012 10:33 am
thank you very much Segher, your code works and helps a lot.
One more question : here is the assembly code generated by your second code :
There are many accesses to the stack, in particular in the loop, is it possible to temporary store the different parameters into registers and then work directly on them, in order to limit the number of instructions?
Again thanks a lot
One more question : here is the assembly code generated by your second code :
Code: Select all
0x00010606 <trans_out>: entsp (u6) 0x5
0x00010608 <trans_out+2>: stw (ru6) r0, sp[0x4]
0x0001060a <trans_out+4>: stw (ru6) r1, sp[0x3]
0x0001060c <trans_out+6>: stw (ru6) r2, sp[0x2]
0x0001060e <trans_out+8>: ldw (ru6) r3, sp[0x4]
0x00010610 <trans_out+10>: outct (rus) res[r3], 0x1 *
0x00010612 <trans_out+12>: chkct (rus) res[r3], 0x1 *
0x00010614 <trans_out+14>: ldw (ru6) r3, sp[0x2]
0x00010616 <trans_out+16>: ldw (ru6) r11, sp[0x3]
0x00010618 <trans_out+18>: ldaw (l3r) r3, r11[r3]
0x0001061c <trans_out+22>: stw (ru6) r3, sp[0x3]
0x0001061e <trans_out+24>: ldw (ru6) r3, sp[0x2]
0x00010620 <trans_out+26>: neg (2r) r3, r3
0x00010622 <trans_out+28>: stw (ru6) r3, sp[0x0]
0x00010624 <trans_out+30>: bu (u6) 0x9
0x00010626 <trans_out+32>: ldw (ru6) r0, sp[0x0]
0x00010628 <trans_out+34>: ldw (ru6) r1, sp[0x3]
0x0001062a <trans_out+36>: ldw (ru6) r2, sp[0x4]
0x0001062c <trans_out+38>: ldw (3r) r3, r1[r0]
0x0001062e <trans_out+40>: out (r2r) res[r2], r3 *
0x00010630 <trans_out+42>: stw (ru6) r3, sp[0x1]
0x00010632 <trans_out+44>: ldw (ru6) r0, sp[0x0]
0x00010634 <trans_out+46>: add (2rus) r0, r0, 0x1
0x00010636 <trans_out+48>: stw (ru6) r0, sp[0x0]
0x00010638 <trans_out+50>: ldw (ru6) r0, sp[0x0]
0x0001063a <trans_out+52>: bt (ru6) r0, -0xb
0x0001063c <trans_out+54>: ldw (ru6) r0, sp[0x4]
0x0001063e <trans_out+56>: outct (rus) res[r0], 0x1 *
0x00010640 <trans_out+58>: chkct (rus) res[r0], 0x1 *
0x00010642 <trans_out+60>: retsp (u6) 0x5
Again thanks a lot
-
- XCore Expert
- Posts: 844
- Joined: Sun Jul 11, 2010 1:31 am
Nice! I didn't expect that really :-)scrusson wrote:your code works
There is a rather big problem with it: nowhere does it describe it reads from
the memory you pass in as "addr", so in theory this whole inline code could
be reordered to before the code that writes to that memory. You probably
won't ever see it do that, but still, be aware.
Using the C compiler (LLVM), ... and -O0. Uh-oh. Try -O2 :-)One more question : here is the assembly code generated by your second code :
It turns out LLVM de-optimises the loop, it really loves to have all
induction variables start at 0, or something like that. Maybe you
should write the loop part in asm as well.
Compiling it as XC code (with the obvious changes -- get rid of
"asm volatile", that kind of thing) generates much better code; a
more simplistic compiler doesn't help you much with optimising
code for you, but also doesn't get in the way so much!
That's -O0 for you :-)There are many accesses to the stack,
Some other options if you don't like fighting the compiler: a) write
it as one big asm() block; b) write is as an assembler (.s) file.
-
- Respected Member
- Posts: 395
- Joined: Tue May 18, 2010 12:25 am
How does the second piece of code? You have
but j is unsigned. Am i missing something?
Also, I never saw/thought of that. Clever :P
Code: Select all
j = -size;
Also, I never saw/thought of that. Clever :P
-
- XCore Expert
- Posts: 844
- Joined: Sun Jul 11, 2010 1:31 am
"unsigned" in C does not mean "non-negative number". It means the
arithmetic on it is done modulo the word size (2**32 in this case). All
arithmetic on unsigned numbers is well-defined (except dividing by
zero), unlike arithmetic on signed numbers (which is full of undefined
behaviour, which you need to avoid always).
"-x" on unsigned numbers is perfectly well-defined, for all values of x.
Adding x to it will give 0.
This loop transform is pretty standard, and various XMOS example code
uses it. The alternative (which might work better with LLVM, I haven't
tested) is to store the data backwards in memory, and then do
arithmetic on it is done modulo the word size (2**32 in this case). All
arithmetic on unsigned numbers is well-defined (except dividing by
zero), unlike arithmetic on signed numbers (which is full of undefined
behaviour, which you need to avoid always).
"-x" on unsigned numbers is perfectly well-defined, for all values of x.
Adding x to it will give 0.
This loop transform is pretty standard, and various XMOS example code
uses it. The alternative (which might work better with LLVM, I haven't
tested) is to store the data backwards in memory, and then do
Code: Select all
j = size;
while(j) {
j--;
... do stuff with addr[j] ...;
}
-
- Respected Member
- Posts: 395
- Joined: Tue May 18, 2010 12:25 am
Thanks for the explanation! Going to go read up on undefined behavior - don't know much about C.
Sorry for hijacking this thread.
Sorry for hijacking this thread.