NOPs in generated ASM code

Technical questions regarding the XTC tools and programming with XMOS.
Post Reply
DemoniacMilk
XCore Addict
Posts: 191
Joined: Tue Jul 05, 2016 2:19 pm

NOPs in generated ASM code

Post by DemoniacMilk »

Hey everyone!

while optimizing my software I realized there are a lot of NOPs in the generated asm code (on -O3).
I thought most operations are done within one scheduler cycle (= 5+ clk cycles), so I am wondering if the nops in the following example are necessary?

Code: Select all

Found start flag!
?          0x409dc 	nop (0r) 
           0x409de 	ldw (ru6) r7, sp[0x4]
?          0x409e0 	nop (0r) 
           0x409e2 	ldw (3r) r0, r6[r7]
           0x409e4 	add (2rus) r10, r0, 0x6
?          0x409e6 	nop (0r) 
           0x409e8 	ldc (lru6) r1, 0x5f8
           0x409ec 	add (3r) r0, r0, r1
?          0x409ee 	nop (0r) 
?          0x409f0 	nop (0r) 
           0x409f2 	stw (ru6) r0, sp[0x3]
           0x409f4 	ldw (lru6) r0, cp[0x3c]
           0x409f8 	add (2rus) r1, r8, 0x0
           0x409fa 	stw (ru6) r0, sp[0x5]
           0x409fc 	sub (3r) r0, r8, r7
           0x409fe 	stw (ru6) r7, sp[0x4]
           0x40a00 	waiteu (0r) 


User avatar
Bianco
XCore Expert
Posts: 754
Joined: Thu Dec 10, 2009 6:56 pm
Contact:

Post by Bianco »

Looks like this is padding for dual issue mode.
If a 16-bit encoded instruction cannot be issued at the same time with the next instruction, a padding nop will be inserted for the instruction lane that is not used.
Clever reordering of instructions can save you some nops but this is not always possible.
DemoniacMilk
XCore Addict
Posts: 191
Joined: Tue Jul 05, 2016 2:19 pm

Post by DemoniacMilk »

ou dear, i totally forgot about dual issue mode.
This probably means all my conclusions about the timing behavior of my code are incorrect.

Thanks for your answer!
peter
XCore Addict
Posts: 230
Joined: Wed Mar 10, 2010 12:46 pm

Post by peter »

If you want to find out how long some code is going to take to execute you can use XTA:

Code: Select all

xta load BIN
analyze endpoints 0x409dc 0x40a00
print trace -
should do the trick. Though, note that the timing printed is for the actual number of active cores. If you want to know the actual worst-case then you should configure XTA to consider that there are 8 cores active by adding the following before the analyze command (putting the correct tile identifier in for tile[0]

Code: Select all

config tasks tile[0] 8
Post Reply