fastest possible timed port output (startkit)
-
- New User
- Posts: 2
- Joined: Mon Mar 02, 2015 1:13 am
fastest possible timed port output (startkit)
Hi:
I'm trying to use the Startkit (Digikey XK-STK-A8DEV) to output arbitrary
values from a 32 bit port as quickly as possible, at specific port times.
I would like to be able to change the port value every 20 ns, or even
faster if possible.
The relevant fragments of my code are:
clock clk = XS1_CLKBLK_1;
out buffered port:32 output = XS1_PORT_32A;
// configure 100 Mhz clock
configure_clock_rate(clk,100,1);
configure_out_port(output, clk, 0);
// output 32 bit patterns at specific times as quickly as possible
unsigned short t;
output <: 0 @ t; // reading port counter seems to take about 15 clocks
t += 100;
output @ t <: 0xf43f34;
t += 3;
output @ t <: 0x22d45d;
t += 3;
output @ t <: 0x34d4f;
t += 3;
output @ t <: 0x44f54;
t += 3;
This works fine, but I try smaller increments for t (e.g. t += 2) then the outputs fail
as the output statement is reached after time t has passed.
Running xobjdump to disassemble the xc code, it seems clear why at least
3 clock cycles are needed, as the output appears to take 2 instructions and the port counter
increment another instruction:
output @ t <: 0x1;
0x0001033c: 96 3f: setpt (r2r) res[r6], r1
0x0001033e: ee ae: out (r2r) res[r6], r11
t += 3;
0x00010340: 40 12: add (3r) r0, r0, r4
My questions are:
1) Is there any faster way, perhaps in assembly code, to output
a 32 bit constant to a port and increment a port timer in less than 3 clocks?
(It would meet my needs if I could accomplish this output and increment in 2 clocks).
2) I've also tried to change the reference clock from 100 Mhz to 200 or 400 Mhz using
the techniques outlined in https://www.xcore.com/forum/viewtopic.p ... 3&start=20
but see no port output at all after using write_sswitch_reg to set REFDIV_REGNUM
to 0x1 or 0x0. Does anyone have any specific information or sample code that
has successfully increased the reference clock speed on a Startkit so as to be
able to achieve smaller minimum times between timed port output?
Thanks for any suggestions.
-
- XCore Legend
- Posts: 1913
- Joined: Thu Jun 10, 2010 11:43 am
Some comments:
a) IO Port speed on the XMOS devices are limited to a max of 60 Mhz. This may be the issue since you are applying the 100 Mhz internal clock for your port access through the following line:
// configure 100 Mhz clock
configure_clock_rate(clk,100,1);
b) Consider to add to your code:
start_clock ( clk ) ;
after your
// configure 100 Mhz clock
configure_clock_rate(clk,100,1);
configure_out_port(output, clk, 0);
c) Try:
// configure 50 Mhz clock
configure_clock_rate(clk,100,2);
Does the t+=2 work @ 50 Mhz ?
d) To review some port access in assembler, review the code posted on github for the SDRAM IP:
https://github.com/xcore/sc_sdram_burst ... 2S16400F.S
assembler code to write to the SDRAM device:
https://github.com/xcore/sc_sdram_burst ... 16400F.inc
Also, the Ethernet IP should offer similar ideas on how to squeeze out bandwidth from the IO ports.
The documentation states that the port speed is limited to 60 Mhz when driven by an external clock. However, you are using the internal clock to drive the ports. So perhaps the compiled instructions are the root cause that are limiting your port speed. Perhaps someone else can clarify if the ports can be driven at higher speed if using the internal clock of 100 Mhz vs. an external clock source (60 Mhz as documented).
a) IO Port speed on the XMOS devices are limited to a max of 60 Mhz. This may be the issue since you are applying the 100 Mhz internal clock for your port access through the following line:
// configure 100 Mhz clock
configure_clock_rate(clk,100,1);
b) Consider to add to your code:
start_clock ( clk ) ;
after your
// configure 100 Mhz clock
configure_clock_rate(clk,100,1);
configure_out_port(output, clk, 0);
c) Try:
// configure 50 Mhz clock
configure_clock_rate(clk,100,2);
Does the t+=2 work @ 50 Mhz ?
d) To review some port access in assembler, review the code posted on github for the SDRAM IP:
https://github.com/xcore/sc_sdram_burst ... 2S16400F.S
assembler code to write to the SDRAM device:
https://github.com/xcore/sc_sdram_burst ... 16400F.inc
Also, the Ethernet IP should offer similar ideas on how to squeeze out bandwidth from the IO ports.
The documentation states that the port speed is limited to 60 Mhz when driven by an external clock. However, you are using the internal clock to drive the ports. So perhaps the compiled instructions are the root cause that are limiting your port speed. Perhaps someone else can clarify if the ports can be driven at higher speed if using the internal clock of 100 Mhz vs. an external clock source (60 Mhz as documented).
-
- New User
- Posts: 2
- Joined: Mon Mar 02, 2015 1:13 am
Mon2:
> b) Consider to add to your code:
> start_clock ( clk ) ;
Yes, that is already in my code.
Sorry, in the original posting of my question, I only included
the code fragments I considered relevant in order to keep the posting
short. But yes, I do start the clock I configure.
> c) Try:
> // configure 50 Mhz clock
> configure_clock_rate(clk,100,2);
> Does the t+=2 work @ 50 Mhz ?
Yes, with lower clock rates I can use smaller port counter
increments without missing the output times.
I should also mention that the code I posted works fine with the 100 Mhz clock,
as long as I use increments of t+=3, or any increment value >3.
But, my application requires the highest possible output speed.
Which I why I had also tried (unsuccessfully so far) to increase (not decrease)
the internal clock speed (from 100 Mhz to 200 or 400 Mhz) using
the write_sswitch_reg() call.
> So perhaps the compiled instructions are the root cause that are limiting your port speed.
Yes, I agree.
From disassembly of my code, it seems as though timed port output
requires 3 instructions: SETPT and OUT for the timed port output, and
an ADD for the increment of the port timer.
I am not familiar with xmos assembly code, and I suspect there might be ways to do timed port
output faster than can be achieved by the compiled xc code.
I'll look at the ethernet slice code to see if there are any tricks there
for achieving the timed port output and port counter increment in fewer
instructions, and also hope someone reading this has experience
with this issue and can offer suggestions.
Thanks.
> b) Consider to add to your code:
> start_clock ( clk ) ;
Yes, that is already in my code.
Sorry, in the original posting of my question, I only included
the code fragments I considered relevant in order to keep the posting
short. But yes, I do start the clock I configure.
> c) Try:
> // configure 50 Mhz clock
> configure_clock_rate(clk,100,2);
> Does the t+=2 work @ 50 Mhz ?
Yes, with lower clock rates I can use smaller port counter
increments without missing the output times.
I should also mention that the code I posted works fine with the 100 Mhz clock,
as long as I use increments of t+=3, or any increment value >3.
But, my application requires the highest possible output speed.
Which I why I had also tried (unsuccessfully so far) to increase (not decrease)
the internal clock speed (from 100 Mhz to 200 or 400 Mhz) using
the write_sswitch_reg() call.
> So perhaps the compiled instructions are the root cause that are limiting your port speed.
Yes, I agree.
From disassembly of my code, it seems as though timed port output
requires 3 instructions: SETPT and OUT for the timed port output, and
an ADD for the increment of the port timer.
I am not familiar with xmos assembly code, and I suspect there might be ways to do timed port
output faster than can be achieved by the compiled xc code.
I'll look at the ethernet slice code to see if there are any tricks there
for achieving the timed port output and port counter increment in fewer
instructions, and also hope someone reading this has experience
with this issue and can offer suggestions.
Thanks.
-
Verified
- Experienced Member
- Posts: 117
- Joined: Fri Dec 11, 2009 10:22 am
If you are looking to do a short burst (up to 11 outputs) then the fastest way to do it would be to preload r1-r11 with the output data then
out res[r0], r1
out res[r0], r2
out res[r0], r3
...
out res[r0], r11
This would give you 10ns updates at the cost of only being able to do a short burst. However, in order to do this speed you would also need to alocate 100MIPS to the core that is exectuing the instructions.
If you were looking for a sustained output then you would have to lower the output speed. The minimum amount of instructions to achieve this would be
I hope this helps
out res[r0], r1
out res[r0], r2
out res[r0], r3
...
out res[r0], r11
This would give you 10ns updates at the cost of only being able to do a short burst. However, in order to do this speed you would also need to alocate 100MIPS to the core that is exectuing the instructions.
If you were looking for a sustained output then you would have to lower the output speed. The minimum amount of instructions to achieve this would be
- load or fetch the data from a channel
- output the data
- loop
I hope this helps