Parallel Output to Multiple Ports in Assembler

All technical discussions and projects around startKIT
Post Reply
Bambus
Member++
Posts: 27
Joined: Tue Feb 28, 2017 12:52 pm

Parallel Output to Multiple Ports in Assembler

Post by Bambus »

Hi all,

I have several output ports configured using the same clock that I want to drive in parallel. My goal is to piece together three buffered:8 4bit ports and four buffered:8 1bit ports (using outpw, res[port_1E], buff_E[0], 2). I want to drive the pins via Assembler, sending an array of int to the function but I got stuck. This is what I got:

...

ldw port_4A, r0[0]
ldw port_4B, r0[1]
ldw port_4C, r0[2]
ldw port_1E, r0[3]
ldw port_1F, r0[4]
ldw port_1G, r0[4]
ldw port_1H, r0[4]


out res[port_4A], buff_A[0]
out res[port_4B], buff_B[0]
out res[port_4C], buff_C[0]
out res[port_1E], buff_E[0]
out res[port_1F], buff_F[0]
out res[port_1G], buff_G[0]
out res[port_1H], buff_H[0]

out res[port_4A], buff_A[1]
out res[port_4B], buff_B[1]
out res[port_4C], buff_C[1]
out res[port_1E], buff_E[1]
out res[port_1F], buff_F[1]
out res[port_1G], buff_G[1]
out res[port_1H], buff_H[1]
...
...

But when I input the data on the other side, I only get rubbish. Looks to me as if the ports are not driven simultaneously because the output is blocking? I also tried setpt to no avail.

I would appreciate it very much if anyone could help me with this, since this is the last piece missing in the code.

Thank you!


User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am
Contact:

Post by infiniteimprobability »

You can get the outputs (of multiple ports) to be perfectly synchronised by:

a) Clocking all of the ports of a clock that is slow enough such that you have a chance to load all port registers within the clock period. This is what we do for I2S..
b) Ensuring that all of the port timers for each port are synchronised (attach them all to the same clock block, and start the clock...which clears them) and then set the port timer to a time in the future which gives you enough time load all of the port data registers. The instruction setpt res[my_port], output_time can be used for this

Either way, personally I'd experiment at a high level (XC) first and then move to ASM, but perhaps that's just me! xobjdump -S (or the debugger) can give you a nice disassembly of your compiled code to give you ASM pointers.

Sections 1) and 3) of this short document https://www.xmos.com/download/private/X ... t(1.0).pdf are highly relevant

Good luck!
Bambus
Member++
Posts: 27
Joined: Tue Feb 28, 2017 12:52 pm

Post by Bambus »

Thank you for your reply!

I already had all of them attached to the same clock and also started the clock before outputting. So I need to use setpt before each port output, or only once before I start writing the series of array elements to the ports? i.e. is it either:

1)
setpt res[port_4A], r2
setpt res[port_4B], r2
setpt res[port_4C], r2
setpt res[port_1E], r2
setpt res[port_1F], r2
setpt res[port_1G], r2
setpt res[port_1H], r2
out res[port_4A], buff_A[0]
out res[port_4B], buff_B[0]
out res[port_4C], buff_C[0]
out res[port_1E], buff_E[0]
out res[port_1F], buff_F[0]
out res[port_1G], buff_G[0]
out res[port_1H], buff_H[0]

add r2, r2, 1

setpt res[port_4A], r2
setpt res[port_4B], r2
setpt res[port_4C], r2
setpt res[port_1E], r2
setpt res[port_1F], r2
setpt res[port_1G], r2
setpt res[port_1H], r2
out res[port_4A], buff_A[1]
out res[port_4B], buff_B[1]
out res[port_4C], buff_C[1]
out res[port_1E], buff_E[1]
out res[port_1F], buff_F[1]
out res[port_1G], buff_G[1]
out res[port_1H], buff_H[1]

add r2,r2, 1

...

or

2)

setpt res[port_4A], r2
setpt res[port_4B], r2
setpt res[port_4C], r2
setpt res[port_1E], r2
setpt res[port_1F], r2
setpt res[port_1G], r2
setpt res[port_1H], r2
out res[port_4A], buff_A[0]
out res[port_4B], buff_B[0]
out res[port_4C], buff_C[0]
out res[port_1E], buff_E[0]
out res[port_1F], buff_F[0]
out res[port_1G], buff_G[0]
out res[port_1H], buff_H[0]

out res[port_4A], buff_A[1]
out res[port_4B], buff_B[1]
out res[port_4C], buff_C[1]
out res[port_1E], buff_E[1]
out res[port_1F], buff_F[1]
out res[port_1G], buff_G[1]
out res[port_1H], buff_H[1]

...

I had hoped to implement it in asm right away, because it didn't seem to me such a complex thing to do. But you are right, I will try it in xc, if slowing down the clock and using setpt won't do it.

Thank you!
User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am
Contact:

Post by infiniteimprobability »

Hi, that either is OK as long as you have time to load up all of the ports before the next falling edge of the clock (out on fall, in on rise). If the clock runs fast and you want to output after a whole number of clock periods, then the first approach is OK. If you clock is slower than the time to load all the regs then the second can work

Also, I assume that before there is a getts r2, res[port_4A] and add r2, r2, <big enough number>?

The port timers are 16b timers which free run. They are reset when you start the clock block, but from then on you should only rely on the relative number. You can use a 32b register because thing wrap at 0xffff so the only limitation is that you set the next port even less than 65536 * your clock period.
Bambus
Member++
Posts: 27
Joined: Tue Feb 28, 2017 12:52 pm

Post by Bambus »

Thanks again for your help.

I did a few tests with your suggestions and I actually managed to get some results! I still have to optimize the code so that the timing is correct, because as it is right now I the reading is a bit faster than the writing so that some words are read twice.

I don't use getts, because I am defining the write/read time myself beforehand. I tried it with only one setpt at the beginning of the write/read series, but since I would rather choose a slower clock and output on each falling edge, I was thinking of add r2,r2,2 (out buffered port:8 port_4A), because I read that buffered ports don't block.

I didn't think of the wrapping after ffff, thank you for the hint!
Post Reply