Code: Select all
asm(".syntax architectural\nLDIVU %0,%1,%2,%3,%4\n.syntax default" : "=r"(output) , "=r"(remainder) : "r"(al) , "r"(denominator) ,"r"(ah));
1. This instruction can take up to bpw cycles to complete and the timing analyser assumes the worst case of 32. Can someone explain the "up to" part to me? What input values cause the worst case and best case?
2. The manual says "the divide unit is shared between threads". Does this mean that if I try to do an LDIVU operation on 4 threads on the same tile at the same tile then 3 of the threads will block while the first thread completes?