Page 1 of 1

LDIVU questions

Posted: Fri Apr 21, 2017 3:08 pm
by Redeye
Although I've tried to avoid it, I need to do a long division operation in my code. Preferably quite quickly. So I'm using the LDIVU instruction on an XCORE200 processor in inline assembly as follows :

Code: Select all

asm(".syntax architectural\nLDIVU %0,%1,%2,%3,%4\n.syntax default" : "=r"(output) , "=r"(remainder) : "r"(al) , "r"(denominator) ,"r"(ah));
It seems to be working fine, but reading the XS2 Architecture Manual raises a couple of questions :

1. This instruction can take up to bpw cycles to complete and the timing analyser assumes the worst case of 32. Can someone explain the "up to" part to me? What input values cause the worst case and best case?

2. The manual says "the divide unit is shared between threads". Does this mean that if I try to do an LDIVU operation on 4 threads on the same tile at the same tile then 3 of the threads will block while the first thread completes?

Re: LDIVU questions

Posted: Tue Apr 25, 2017 6:15 pm
by ahogen
I'm no XMOS expert, but in response to #2, I would imagine that, yeah, any single instruction would be blocking. For most instructions, that's only once cycle so everybody's happy. But for a divide, I don't think it'd make much sense to put new instructions in your single, shared pipeline before a divide has completed.

Re: LDIVU questions

Posted: Tue Apr 25, 2017 8:20 pm
by data
Plain old user here, not factory -- but my understanding is that the operation time depends on the argument values (this is common for dividers), and trying to play the odds on that is not generally useful.

And yes, as I understand it, there is only one divide unit per tile, so everything on that tile has to queue up to use it.

32 cycles is pretty quick for this operation, so I don't usually worry about it, except for real-time DSP code: I go to quite some length to keep divisions out of that. When a division has to be done, I assume 32 * n worst case cycles, n being the number of threads which might use the divider.

Re: LDIVU questions

Posted: Wed Apr 26, 2017 9:42 pm
by Redeye
Hmmm, yes that was my interpretation too. Unfortunately this is in real-time DSP code and the worst case I'd assumed (as you had too) gets a bit borderline.

Using xscope to measure the function timing suggests that it's normally quite a lot less than 32 cycles which is why I wondered if there were any rules for the worst case that would help me stay on the safe side.

For this particular application speed is more important than accuracy so I think there's probably a quicker (and more importantly, deterministic) method than the LDIVU function for this case.