LDIVU questions
Posted: Fri Apr 21, 2017 3:08 pm
Although I've tried to avoid it, I need to do a long division operation in my code. Preferably quite quickly. So I'm using the LDIVU instruction on an XCORE200 processor in inline assembly as follows :
It seems to be working fine, but reading the XS2 Architecture Manual raises a couple of questions :
1. This instruction can take up to bpw cycles to complete and the timing analyser assumes the worst case of 32. Can someone explain the "up to" part to me? What input values cause the worst case and best case?
2. The manual says "the divide unit is shared between threads". Does this mean that if I try to do an LDIVU operation on 4 threads on the same tile at the same tile then 3 of the threads will block while the first thread completes?
Code: Select all
asm(".syntax architectural\nLDIVU %0,%1,%2,%3,%4\n.syntax default" : "=r"(output) , "=r"(remainder) : "r"(al) , "r"(denominator) ,"r"(ah));
1. This instruction can take up to bpw cycles to complete and the timing analyser assumes the worst case of 32. Can someone explain the "up to" part to me? What input values cause the worst case and best case?
2. The manual says "the divide unit is shared between threads". Does this mean that if I try to do an LDIVU operation on 4 threads on the same tile at the same tile then 3 of the threads will block while the first thread completes?