XMOS Port Max Sampling Rate

Technical discussions around xCORE processors (e.g. xcore-200 & xcore.ai).
User avatar
fabriceo
Respected Member
Posts: 257
Joined: Mon Jan 08, 2018 4:14 pm

Post by fabriceo »

Hello,
here is an example of assembly code embedded in a XC function which is optimized to store a port-input into a circular buffer. it take 4 core cycles only to read, store, increment, compare, reset and loop.

Code: Select all

[[dual_issue]] void circularbuff(buffered in port:4 p, unsigned &ptr, unsigned max){
    unsigned idx,val,test;
    asm volatile("#allocate reg %0 %1 %2":"=r"(idx),"=r"(val),"=r"(test));
    asm volatile(
            ".Looop_%=:"
            "\n  { in %4,res[%0] ; and %3,%3,%5 }"
            "\n  stw %4,%1[%3]"
            "\n  { lsu %5,%3,%2 ; add %3,%3,1 }"
            "\n  { neg %5,%5 ; bu .Looop_%= }"
              // %0      %1       %2       %3       %4        %5
            ::"r"(p),"r"(ptr),"r"(max),"r"(idx),"r"(val),"r"(test));
}
this means that a port can be read and stored at say 600/5/4 = 30mhz.
if this is a 4 bits ports configured in buffered mode (as in this example) then the theoretical sampling frequency could be 240mhz (30 x 32/4).
if it is 8 bits port, the routine should be ok up to 120mhz (default reference clock here).

fyi some tips here:
the first asm volatile is used to fool the compiler and to effectively allocate 3 temporary registers, used in the next asm statement.
the lsu instruction returns 1 as long as the idx is below max, then the neg instruction provides FFFFFFFF
when idx==max, lsu return 0 and then neg will return 0 as well, which will reset idx with the and instruction bundled with in.
max should be size of buffer - 1

remark : the only way to stop this loop is to use an event from a timer or a channel.
removed
hope this helps
Last edited by fabriceo on Mon Jun 30, 2025 11:06 am, edited 1 time in total.
User avatar
Ross
Verified
XCore Legend
Posts: 1204
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

Jcvc wrote: Fri Jun 27, 2025 8:57 am Yeah, I tried changing it to a for loop instead, but no difference. I'll see then if I can improve it.
What compile options are you using?
Technical Director @ XMOS. Opinions expressed are my own
Joe
Verified
Experienced Member
Posts: 122
Joined: Sun Dec 13, 2009 1:12 am

Post by Joe »

Ran up a quick example in c:
portfastsample.zip
Compiles to:

0x00080230: c4 b6: in (2r) r1, res[r0]
0x00080232: ff 17: nop (0r)
0x00080234: ff 17: nop (0r)
0x00080236: 4f 54: stw (ru6) r1, sp[0xf]
0x00080238: c4 b6: in (2r) r1, res[r0]

The nops are due to dual issue so actually this loop take 2 cycles.

Obviously fabriceo's circular buffer more flexible but shows a basic implementation of highest speed port sampling.

Joe
You do not have the required permissions to view the files attached to this post.
XMOS hardware grey beard.
Jcvc
Member++
Posts: 19
Joined: Wed May 07, 2025 11:13 pm

Post by Jcvc »

Sorry, didn't notice yesterday that there were new comments on the thread as these were on the second page.
Now, on to the more interesting stuff :D

Thanks Fabriceo, that's very kind of you for sharing that code and the detailed information on it! I'll give it a go to see how it compares to what I have and to start getting used to the assembly side (haven't used it yet) on XMOS for operations with higher timing constraints. Again... very much appreciated :)
What compile options are you using?
By default with no optimisation, but I've tried with o2 & o3 levels, but performance was the same (even though on the disassembly side, I couldn't then dig it up).
Ran up a quick example in c:
Thank you Joe, also very kind of you for helping me so much. Your sample code is pretty much the same (I only divide the xcore Clock once by 2) of what I have and performance-wise (since your recommendation of the pragma unroll), seems to be the same on both.

I'll have to check with a few different sources to rule out some test inconsistency due to jittering on my test system as well. At the moment I seem to be managing to sample at ~200MHz without losing any bits between reads, but definitely need to carry out more testing to ensure stability of the code as well as signal integrity :)