The curious case of the disappearing port

SpacedCowboy · Post by **SpacedCowboy** » Wed Jun 18, 2025 6:30 pm

So I want a fast USB connection from a 16-bit address and 16-bit data port, I really wanted USB3, but the FT601 USB3<->FIFO chip provides a 66.6MHz clock to synchronise to - trivial for an FPGA, but the XMOS ports top out at 60MHz for incoming signal resolution :( So USB2 it is - I don't really need the throughput of USB3, I just wanted the better latency, so I can live with USB2. Looking around, the data sheet for the XU316-1024-TQ128-C24 says "Up to 32 x 1bit port, 9 x 4bit port, 6 x 8bit port, 2 x 16bit port", so that seemed like a good place to start...

However, I can't see how to actually get 2 16-bit ports from the available pins.

On tile X0, port 16B is available, but on tile X1, port 16B seems to be missing pins 12,13 and 14. 16A is unavailable on either tile. Unless I'm missing something ?

If there isn't any way to get the second port up and running, that would leave me with two options:

Go with the (more expensive) XU316-1024-FB265-C32 which is total overkill for the design needs, or
Go with the (slower, previous generation, smaller RAM) XUF216-512-TQ128-C20A, which provides 16A and 16B on tile X1

Are those my options ? Or is there something I'm not understanding in the data sheet for the XU316-1024-TQ128-C24 ?

[edit: Both ports 8B and 8C are available on tile X1 (why couldn't we bind out the ports that made up a 16-bit port as well?) so I guess I can do two 8-bit reads rather than 1 16-bit one and still use the TQ128 XU316 part]

Joe · Post by **Joe** » Thu Jun 19, 2025 11:26 am

You're correct, the XU316-TQ128 does not have two fully bonded out 16 bit ports, I'll update the datasheet.

I'm not sure I understand your application but the ports can run much faster than 60MHz, we run qspi at 133MHz for example for fast reading.

I would always recommend xcore.ai over xcore-200 if your application is suitable for the reasons you mentioned and more.

Using two 8 bit ports is possible, if you run them off the same clock block you can then use timed input/output to make sure the data on both ports is in sync.

SpacedCowboy · Post by **SpacedCowboy** » Thu Jun 19, 2025 3:24 pm

Thanks Joe, good to know for sure :)

I’m getting the 60MHz figure from the datasheet too - it says “Port sampling rates of up to 60MHz with respect to an external clock”. The idea would be to link up to the FT600Q which provides a FIFO interface to a USB3 port. The FT600Q drives a 66.6MHz or 100MHz clock out, and provides a bidirectional 16-bit bus for access synchronized to the clock (with associated read/write request and data-ready/fifo-full signals). The XMOS device would have to respond at the clock-rate, though it gets to manage the control signals. The thing is, if it missed a single clock at 66MHz, the data could be gone…

I did read about the timings but I wasn’t sure if in the text…

For data and clock travelling in the same direction (ie, source-synchronous clocking where both output by the xCORE or both input by the xCORE), frequencies of up to 60 MHz can work.

Faster clocks may also work, but an analysis is needed to show whether any programmatic delays are necessary to cover all corners. In particular, some timings depend on PVT variations (Process, Voltage, and Temperature). An analysis will ensure that an interface will work under all possible corners, rather than the corners that were tested.

“Faster clocks” meant “faster clocks up to 60MHz” (given the previous statement that “frequencies of up to 60 MHz can work.” (emphasis mine) or whether it meant “you can push beyond 60MHz if you characterize things carefully, do your due diligence, and use special cases where necessary”. I didn’t really want to spin a board on the off-chance, and since I could probably manage with USB2…

If I’m misunderstanding, and there is some way of pushing the XMOS devices that to work with that high an incoming clock, I’d love to know, and especially if there’s a worked example, that’d be awesome :) Of course, then I’d need another 16-bit port (so unless I multiplex the current 16-bit address/data ports I’d be pushing up to the larger part in this case)…

Thanks, I was reading about the timed aspect of the ports, and I think that might work well for me :)

Joe · Post by **Joe** » Thu Jun 19, 2025 3:55 pm

See: https://www.xmos.com/documentation/XM-0 ... index.html

As mentioned, we run a source synchronous 4-bit interface at 133MHz. The usual timing limitation is the round trip case i.e. in this case from clock in to the xcore to data out. This will have some uncertainty and you'd need to check that uncertainly still leaves you with a data valid window big enough for the ftdi chip. I would think it would be fine on xcore.ai at 66MHz.

The next problem however would be the data rate, i.e. even with a buffered 32 bit port, at 66MHz you're going to have to do an IN at a rate of 33MHz for a 16-bit port. So you'd probably want to use an 800MHz device and/or limit your thread count. The fastest would be to stream data in chunks to/from sram then you'll be doing in , store, in, store ... instructions in sequence. In the extreme case for max speed you might need to write that in assembler to make sure the compiler doesn't slip in any extra instructions in a bad spot.

Obviously I don't understand your app exactly but this hopefully gives a flavour. I worry you might not have the MIPS to process any data stream coming in that fast but if it's just small amounts of data coming in at high speed occasionally then obviously you might be ok.

Cheers,
Joe

SpacedCowboy · Post by **SpacedCowboy** » Thu Jun 19, 2025 4:26 pm

Thanks again :) And yes, that’s the link I was reading, so I’m glad to hear it was the latter interpretation that was correct :)

Actually that’s exactly the situation, the data incoming over the address/data busses is relatively slow, I’ll be building a packet over several clocks on this slow bus as data arrives, but then I want to send it out over USB. Similarly, in the other direction I just want to pull data off the USB FIFO, buffer to SRAM, then feed it over the address/data bus on demand.

Most of the time data sent to the USB FIFO will be write-only, so just fire-and-forget, but sometimes (and often clustered together) there will need to be a response back over USB and I’d like that response to be as fast as possible. The microframe boundaries on USB2 are 125uS which adds up if you have a lot of back-and-forth, so using USB3 with the shorter (20-40uS) boundaries, and dedicated tx,rx lines (so no bus turnaround) would be better.

I don’t have a problem with the other requirements you mention either, this is pretty much a single-function device, address/data bus on one side, USBx on the other. If an 800MHz device is needed, though, I think the larger FBGA part is probably necessary, because I don’t see -32 grade devices at digikey in the TQ128 form-factor.

Cheers!

Joe · Post by **Joe** » Thu Jun 19, 2025 4:36 pm

To be clear your thread speed is (core clock/thread count) with a max thread speed of core clock/5.

So for a 600MHz part with 8 threads you get 75MHz per thread, 85MHz with 7 threads, 100MHz with 6 threads, 120MHz with 5 threads or less.

If you can make the ftdi output a data valid signal that can be used directly to gate the data input to the device which would help. This would be a strobed slave port config.

https://www.xmos.com/documentation/XM-0 ... 03003.html

SpacedCowboy · Post by **SpacedCowboy** » Thu Jun 19, 2025 4:52 pm

Right, what I’m saying is that the number of tasks running will be approximately 2 :) one to handle the USB FIFO and one to handle the address/data bus. So I think I’d be getting the 800/5 = 160MHz clock rate for the USB thread. I guess there’s always Fast and Priority modes if I need more threads, but I really don’t think I will.

You’re saying to make the clock the data-valid signal ? Don’t think I’ve done that before on an XMOS device, I’ll go read up on it. I seem to recall there are also ways to manipulate the timing of timed port accesses (on a clock granularity) to help out the data-valid window between setup and hold constraints. 160MHz => 6.25ns clock period, and a 66MHz clock => 15ns period, so there might be some leeway there I could use to help out.

This is all really helpful, thanks a lot :)

Joe · Post by **Joe** » Thu Jun 19, 2025 5:38 pm

No I mean you can use a clock and a data valid signal. The port will input or output data on the clock edge only if the data valid port is high. So it does the flow control for you.

If you're only using a few threads then the 600M TQ128 part should be fine was my point as you'll be getting 120MHz thread speed which should be more than enough for the port IO anyway.

SpacedCowboy · Post by **SpacedCowboy** » Thu Jun 19, 2025 5:41 pm

Ah, ok that’s great :) Thanks :)

The FTDI chip does in fact have ‘Byte-Enabled’ signals for both bytes lanes, both of which are usually asserted but the last transfer may only have the low byte valid, presumably the low-byte enable would work as a data-qualifier, if that would help out the port…

The curious case of the disappearing port Topic is solved

The curious case of the disappearing port

Re: The curious case of the disappearing port

Re: The curious case of the disappearing port

Re: The curious case of the disappearing port

Re: The curious case of the disappearing port

Re: The curious case of the disappearing port

Re: The curious case of the disappearing port

Re: The curious case of the disappearing port

Re: The curious case of the disappearing port