LDIVU questions

Technical questions regarding the XTC tools and programming with XMOS.
Post Reply
Redeye
XCore Addict
Posts: 131
Joined: Wed Aug 03, 2011 9:13 am

LDIVU questions

Post by Redeye »

Although I've tried to avoid it, I need to do a long division operation in my code. Preferably quite quickly. So I'm using the LDIVU instruction on an XCORE200 processor in inline assembly as follows :

Code: Select all

asm(".syntax architectural\nLDIVU %0,%1,%2,%3,%4\n.syntax default" : "=r"(output) , "=r"(remainder) : "r"(al) , "r"(denominator) ,"r"(ah));
It seems to be working fine, but reading the XS2 Architecture Manual raises a couple of questions :

1. This instruction can take up to bpw cycles to complete and the timing analyser assumes the worst case of 32. Can someone explain the "up to" part to me? What input values cause the worst case and best case?

2. The manual says "the divide unit is shared between threads". Does this mean that if I try to do an LDIVU operation on 4 threads on the same tile at the same tile then 3 of the threads will block while the first thread completes?


ahogen
Member++
Posts: 26
Joined: Fri Mar 31, 2017 5:16 pm

Post by ahogen »

I'm no XMOS expert, but in response to #2, I would imagine that, yeah, any single instruction would be blocking. For most instructions, that's only once cycle so everybody's happy. But for a divide, I don't think it'd make much sense to put new instructions in your single, shared pipeline before a divide has completed.
User avatar
data
Active Member
Posts: 43
Joined: Wed Apr 06, 2011 8:02 pm

Post by data »

Plain old user here, not factory -- but my understanding is that the operation time depends on the argument values (this is common for dividers), and trying to play the odds on that is not generally useful.

And yes, as I understand it, there is only one divide unit per tile, so everything on that tile has to queue up to use it.

32 cycles is pretty quick for this operation, so I don't usually worry about it, except for real-time DSP code: I go to quite some length to keep divisions out of that. When a division has to be done, I assume 32 * n worst case cycles, n being the number of threads which might use the divider.
Redeye
XCore Addict
Posts: 131
Joined: Wed Aug 03, 2011 9:13 am

Post by Redeye »

Hmmm, yes that was my interpretation too. Unfortunately this is in real-time DSP code and the worst case I'd assumed (as you had too) gets a bit borderline.

Using xscope to measure the function timing suggests that it's normally quite a lot less than 32 cycles which is why I wondered if there were any rules for the worst case that would help me stay on the safe side.

For this particular application speed is more important than accuracy so I think there's probably a quicker (and more importantly, deterministic) method than the LDIVU function for this case.
Post Reply