The IN instruction does run in a single cycle. The IN instruction blocks until the transfer register is full (meanwhile the shift register is filling with data).
Thank you, this helps to understand port read process!
XU316 does come in an 800MHz version also.
Actually, my bad, I'm using the EVK for this test and it's using the C32, therefore the 800MHz. I was using the MC-316 this morning which is 600MHz and was thinking the EVK was the same, my apologies! But yeah, this means I can push a bit further.
Remember the thread(core) speed is different to the processor speed. If you are using a 600MHz part and using 5 threads or less then the thread speed is 600/5 = 120MHz. If using more than 5 threads, the thread speed is 600/thread count so down to 75MHz with 8 threads.
The thread speed is the rate at which instructions will run.
I'm still running on a single thread. For proof of concept purposes (and to test limits), I'm just running this from the main function, without any parallel statements.
What are you using as the stimulus for your 1-bit port?
I'm using the xCORE clock, configured as below (clocking at 400MHz and not 300MHz as I have previously mentioned in one of my previous comments):
Code: Select all
clock_set_source_clk_xcore(samplerClk);
clock_set_divide(samplerClk, 1);
That should indeed allow me to get up to ~200MHz of sampling.
Now, a few improvements based on your suggestion on the previous comment: Adding the '#pragma unroll' has indeed helped to improve the sampling rate to which I can sample the signal. I currently can successfully sample a signal up to ~50MHz without losing bits between port ins. I need to test at slightly higher frequencies, but if I go straight up to 100MHz, then I see data bits falling behind.
I would check the disassembly with xobjdump -D <your_binary.xe> and see what your loop looks like. Sometimes the compiler can add in array bounds checking which would compromise the timing.
Thanks for the suggestion. I'll look into this more in detail throughout the afternoon, but from the first look, the copy operation itself is taking indeed 1 cycle, but the port_in is taking 6 clock cycles and then some padding happening?:
Code: Select all
<port_in>:
0x0008039c: ff 17: nop (0r)
0x0008039e: 80 7f: dualentsp (u6) 0x0
0x000803a0: c0 b6: in (2r) r0, res[r0]
0x000803a2: ff 17: nop (0r)
0x000803a4: ff 17: nop (0r)
0x000803a6: c0 77: retsp (u6) 0x0
The 6 cycles would currently explain why I can sample a signal of 50MHz but not of 100MHz. 6cycles @ 800MHz clock means that the maximum sample rate would be 133.3(3)MHz
I'll proceed with the debugging and once again (and can't thank you enough), thank you Joe!