Background
Here's the situation:
I'm trying to build an expansion box for an old Atari 8-bit micro, there was one designed but never released by Atari, and I'd like to fill that gap. The current design has an XMOS chip sitting monitoring the expansion bus (basically a bring-out of the chip's address/data/control lines), then transmitting any required traffic over a link (buffered via LVDS) to another XMOS chip sitting in the enclosure where the PCI-style cards will sit. Whatever needs to be done, is done, and any results are sent back over the link so the 6502 can access them.
The host-side (not the enclosure) XMOS has an SDRAM so it can quickly respond to memory-requests from the 6502. On boot, peripherals in the enclosure can upload 6502 code to "their" area of the SDRAM, so there's no need for a round-trip (atari->xmos->link->xmos->peripheral->xmos->link->xmos->atari) when executing 6502 code provided by the peripheral.
The basic idea was that a peripheral (for example a midi card), might upload an interrupt handler to SDRAM at boot time, then as (midi) data comes in, it sends it to SDRAM and triggers an interrupt on the 6502 to execute the code (that it uploaded), and the (midi) data is handled by the 6502 as if it were in local ram. No big round trips, everything can be satisfied (from the point of view of the 6502) from the local XMOS.
To provide this interface, I'm taking over the expansion bus, which is also the cartridge slot, and someone asked me if I'd be providing a cartridge slot on the expansion box to compensate. This is where it gets tricky, because I can't just download the cartridge to SDRAM (some of them are bank-switched internally) so to do this, I *would* in fact have to provide the long round-trip within the timing budget...
Timing budget
A valid address is presented on the bus ~177ns after the clock goes low. The result of any read-operation need to be pushed out to the bus by 558ns after the clock going low, at the falling edge of the next clock, when the result will be latched by the 6502. This gives me an absolute maximum of 381ns to get the data and push it to the 6502's data bus.

- The local XMOS has to read the address. I'm guessing a timer on the clock, or change-from-last-value-event might be good enough here
- It then has to send the address over the link. I can encode the 8K of address offset(13 bits) and command "remote read" into 2 bytes, so 10ns for the read, 10ns for the mask op, 10ns to add the command to the top-3 bits and 10ns to send it down the channel. Total of 40ns so far.
- I'm using a 2-wire link, so I get ~18MBytes/sec, or ~53ns/byte, so there's another 106ns for my two bytes of command/data.
- The enclosure XMOS at the other end has to read the data (10ns), mask off the top-3 bits to get the command (10ns), check for 'read memory' command (10ns) and push out to a local cartridge port (10ns). By physical wiring, I can make the bits of a 16-bit port go to the correct pins on the cartridge port to generate any control lines needed.
- Then we wait 70ns for the cartridge EEPROM to push the byte of data out to its bus
- The enclosure XMOS then has to read the byte (10ns) and push it down the channel for the link (10ns)
- We wait another 53ns for the byte to arrive
- Then we read the byte (10ns) and push it to the 6502's data bus (10ns).
Assuming there aren't mistakes in the budget above (and if so, please tell me :), am I likely to be able to get this level of performance using 'C', or is it going to need assembly to get there ? Since it's basically read/write to ports/channels, I was hoping that C would do :)
Are there any cool XMOS efficiencies I can take advantage of that I'm not accounting for above ?
I wasn't really planning on providing a cartridge port on the expansion box - it was going to be attached locally and in fact the design above was based around not having to manage the long path from 6502->expansion box->back again, but there are some instances where that's not ideal (the new 1088XEL motherboard is mini-ITX compatible, and people are placing them in H80 cases, which don't have an easy way to get to a cartridge if it's internal)
Any advice happily and gratefully received :)