Link round trip time

If you have a simple question and just want an answer.
Post Reply
SpacedCowboy
Experienced Member
Posts: 67
Joined: Fri Aug 24, 2012 9:37 pm
Contact:

Link round trip time

Post by SpacedCowboy »

This is to do with the interface board in the other thread...

Background

Here's the situation:

I'm trying to build an expansion box for an old Atari 8-bit micro, there was one designed but never released by Atari, and I'd like to fill that gap. The current design has an XMOS chip sitting monitoring the expansion bus (basically a bring-out of the chip's address/data/control lines), then transmitting any required traffic over a link (buffered via LVDS) to another XMOS chip sitting in the enclosure where the PCI-style cards will sit. Whatever needs to be done, is done, and any results are sent back over the link so the 6502 can access them.

The host-side (not the enclosure) XMOS has an SDRAM so it can quickly respond to memory-requests from the 6502. On boot, peripherals in the enclosure can upload 6502 code to "their" area of the SDRAM, so there's no need for a round-trip (atari->xmos->link->xmos->peripheral->xmos->link->xmos->atari) when executing 6502 code provided by the peripheral.

The basic idea was that a peripheral (for example a midi card), might upload an interrupt handler to SDRAM at boot time, then as (midi) data comes in, it sends it to SDRAM and triggers an interrupt on the 6502 to execute the code (that it uploaded), and the (midi) data is handled by the 6502 as if it were in local ram. No big round trips, everything can be satisfied (from the point of view of the 6502) from the local XMOS.

To provide this interface, I'm taking over the expansion bus, which is also the cartridge slot, and someone asked me if I'd be providing a cartridge slot on the expansion box to compensate. This is where it gets tricky, because I can't just download the cartridge to SDRAM (some of them are bank-switched internally) so to do this, I *would* in fact have to provide the long round-trip within the timing budget...

Timing budget

A valid address is presented on the bus ~177ns after the clock goes low. The result of any read-operation need to be pushed out to the bus by 558ns after the clock going low, at the falling edge of the next clock, when the result will be latched by the 6502. This gives me an absolute maximum of 381ns to get the data and push it to the 6502's data bus.
Image
So in this time, for a remote cartridge (which provides memory) to be read, the following have to occur
  • The local XMOS has to read the address. I'm guessing a timer on the clock, or change-from-last-value-event might be good enough here
  • It then has to send the address over the link. I can encode the 8K of address offset(13 bits) and command "remote read" into 2 bytes, so 10ns for the read, 10ns for the mask op, 10ns to add the command to the top-3 bits and 10ns to send it down the channel. Total of 40ns so far.
  • I'm using a 2-wire link, so I get ~18MBytes/sec, or ~53ns/byte, so there's another 106ns for my two bytes of command/data.
  • The enclosure XMOS at the other end has to read the data (10ns), mask off the top-3 bits to get the command (10ns), check for 'read memory' command (10ns) and push out to a local cartridge port (10ns). By physical wiring, I can make the bits of a 16-bit port go to the correct pins on the cartridge port to generate any control lines needed.
  • Then we wait 70ns for the cartridge EEPROM to push the byte of data out to its bus
  • The enclosure XMOS then has to read the byte (10ns) and push it down the channel for the link (10ns)
  • We wait another 53ns for the byte to arrive
  • Then we read the byte (10ns) and push it to the 6502's data bus (10ns).
This gives me a total required time of 349ns which is less than 381ns, so it seems possibly feasible... but there's very little margin. I've not any experience with the links on an XMOS device, so I'm not sure if my budgets above are realistic. I'm assuming the clock is running at 100MHz per core, giving an XMOS cycle time of 10ns, and I'm assuming that each of the above is single-cycle.

Assuming there aren't mistakes in the budget above (and if so, please tell me :), am I likely to be able to get this level of performance using 'C', or is it going to need assembly to get there ? Since it's basically read/write to ports/channels, I was hoping that C would do :)

Are there any cool XMOS efficiencies I can take advantage of that I'm not accounting for above ?

I wasn't really planning on providing a cartridge port on the expansion box - it was going to be attached locally and in fact the design above was based around not having to manage the long path from 6502->expansion box->back again, but there are some instances where that's not ideal (the new 1088XEL motherboard is mini-ITX compatible, and people are placing them in H80 cases, which don't have an easy way to get to a cartridge if it's internal)

Any advice happily and gratefully received :)
Last edited by SpacedCowboy on Fri May 04, 2018 4:03 pm, edited 1 time in total.


User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Hello Simon.

The attached XMOS document details the LVDS link benchmarks for 2 wire / 5 wire use.

From this document:

Image

Lost track of the details of your hardware design but do you have LVDS tranceivers onboard your design? They are required for a reliable interface over copper. The applied delays for these links will define the speed but to get the speed, the quality of signal must be there.

Respectively, did we share a project we did (but not yet assembled; project # IC1071KB) using CAT6 cable and LVDS transceivers as a XMOS slice board? If not, we can send you the fileset. This design was intended to allow for cascading of multiple XMOS startKits using standard CAT6 cabling with autoselection of the wiring (ie. straight through or cross wired cabling are supported). The files may even have been posted here in the forum somewhere..

I recall the Atari 6502 CPU being clocked @ 1.79 Mhz. What is the timing for the READ and WRITE in the posted diagram? That is, is your timing not limited to those 2 lines in the post?

Back in the day, did a few hardware designs for the Atari boxes including a memory expansion and also bank switched cartridge which was shadowed from the BASIC XL cartridge design.

https://archive.org/details/MACE_Journal_v5n2_Feb_1985
page 22

and
https://archive.org/details/MACE_Journal_v6n5_May_1986
* article made it to the front cover (did not know :)

Most of the cartridges were ROM based (no bank switching) but some carts (ie. Donkey Kong for example) would attempt to write back to the same memory locations in case someone was attempting to clone the s/w in RAM. The work around we came up was to use one of the free memory locations to perform a write only access to a flip flop (single bit data latch) to remove all write operations to the RAM board. This would then permit a standard RAM board to emulate just like ROM. That is how THE PILL worked, we called ours DEADLOCK cartridge. PILL had a manual toggle switch; ours was all s/w triggered.
Attachments
XS1-L-Link-Performance-Design-Guidelines_2.0.pdf
(385.36 KiB) Downloaded 145 times
XS1-L-Link-Performance-Design-Guidelines_2.0.pdf
(385.36 KiB) Downloaded 145 times
SpacedCowboy
Experienced Member
Posts: 67
Joined: Fri Aug 24, 2012 9:37 pm
Contact:

Post by SpacedCowboy »

mon2 wrote:Hello Simon.

The attached XMOS document details the LVDS link benchmarks for 2 wire / 5 wire use.

From this document:

Image
Hi there :)

Yep - that's where I got my 18MB/sec from. I will need data travelling in both directions, so I took the pessimistic figure of 18MB/sec rather than 20 MB/sec. I wasn't sure if it was only reduced to 18 when the link was saturated, or whether there was a penalty of 2MB/sec for bi-directional traffic, so I was playing safe :)
mon2 wrote: Lost track of the details of your hardware design but do you have LVDS tranceivers onboard your design? They are required for a reliable interface over copper. The applied delays for these links will define the speed but to get the speed, the quality of signal must be there.

Respectively, did we share a project we did (but not yet assembled; project # IC1071KB) using CAT6 cable and LVDS transceivers as a XMOS slice board? If not, we can send you the fileset. This design was intended to allow for cascading of multiple XMOS startKits using standard CAT6 cabling with autoselection of the wiring (ie. straight through or cross wired cabling are supported). The files may even have been posted here in the forum somewhere..
Yep, you pointed me at it the last time I mentioned this - the design does have an LVDS driver on-board...
Image
mon2 wrote: I recall the Atari 6502 CPU being clocked @ 1.79 Mhz. What is the timing for the READ and WRITE in the posted diagram? That is, is your timing not limited to those 2 lines in the post?
Yep, the figures in the text come from that diagram in the original post - it's all I can find from the hardware reference manuals for the atari 8-bits. The box does run at 1.79MHz (from which we get the bus-cycle time of 558ns in the diagram in the original post) - an address only becomes valid at ~177ns (although the RAM specified for replacement is generally 150ns, so maybe there's some slop in there), and the read-data is latched back into the 6502 on the next falling clock. Internal memory will produce a result after 486ns, but I don't think I have to stick to that - I'm fairly certain it's the clock-going-low that is the timing I'm required to meet.
mon2 wrote:
Back in the day, did a few hardware designs for the Atari boxes including a memory expansion and also bank switched cartridge which was shadowed from the BASIC XL cartridge design.

https://archive.org/details/MACE_Journal_v5n2_Feb_1985
page 22

and
https://archive.org/details/MACE_Journal_v6n5_May_1986
* article made it to the front cover (did not know :)

Most of the cartridges were ROM based (no bank switching) but some carts (ie. Donkey Kong for example) would attempt to write back to the same memory locations in case someone was attempting to clone the s/w in RAM. The work around we came up was to use one of the free memory locations to perform a write only access to a flip flop (single bit data latch) to remove all write operations to the RAM board. This would then permit a standard RAM board to emulate just like ROM. That is how THE PILL worked, we called ours DEADLOCK cartridge. PILL had a manual toggle switch; ours was all s/w triggered.
Cool stuff - I didn't even know MACE existed (being 3,500 miles away at the time :) ) and to be fair I was more interested in *playing* donkey-kong at the time than cloning it :) I'll have to make some use of archive.org to see what else is in the MACE journals :)

I'm beginning to wonder if it might not be simpler to actually try and recognise the various types of cartridge and emulate their behaviour using the local-to-the-6502 SDRAM. I can control read/write access to it using the local XMOS, so it would be a matter of copying down the data and then behaving as if we were a cartridge. It wouldn't work for hardware-mapped cartridges (the USB or SD-card ones) but the expansion box could take care of that. I'll ask one of the emulator guys if/how he does it...

Cheers
Simon
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Found these links related to the timing diagram for the Atari parallel bus design:

https://www.atarimagazines.com/index/in ... &mag=antic

https://www.youtube.com/watch?v=fAKWMrJzUZA

http://atariage.com/forums/topic/257488 ... ontroller/

Source code here for the STM32F4 used for the interface (good for a review of the timing details):

https://github.com/robinhedwards/UnoCart
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

In reviewing the timing diagram and Earl Rice's article from Antic magazine, I think that:

a) the data that is written by the 6502 is stable @ 422 ns after the 6502 clock goes low

b) the data to be read by the 6502 should be stable @ 486 ns after the 6502 clock goes low

If practical, consider to be compliant with these (starting) timing values to remain in the shown window of capture. Otherwise, there may be a risk on the actual (not defined) end time for each respective window.

You could start with a small routine to perform an address decode of the 6502 bus with the XMOS CPU and send back predefined values from the XMOS code. Then do the same with a write and confirm that there is no data loss before moving to continue this food chain over LVDS to a remote device, etc. Interesting project and effectively replacing FPGA / CPLD decode logic with s/w state machines. If the STM32F4 can perform this task (although that is for the Atari cartridge port), should be possible to use the parallel bus of the Atari. Be very cautious of the clock cable extension length due to low drive ability. On this note, assuming that you have level shifters for all signals to / from the Atari (5 volt swing) - including the 6502 clock? Best to check what is the actual 6502 clock swing voltage. Most likely 5 volts.
SpacedCowboy
Experienced Member
Posts: 67
Joined: Fri Aug 24, 2012 9:37 pm
Contact:

Post by SpacedCowboy »

Yes, those timings are actually how I got the numbers above.

Writes I'm not too bothered about because the busses are decoupled, so while the data is propagating to the far side, the local XMOS can be monitoring the bus again for a read. There's return value for a write, so as long as it's in-place for the next read-op (which is actually at least 3 clocks away because every non-page-0 instruction is at least 3 cycles), I'm good.

Reads have to return a value, and they have to do so within that 486 ns window, therein lies the problem :)

There actually is far more slack than I expected - I put a logic analyser on the bus, and (@50MHz) got:
Image
... where the white markers represent 80ns. So at least on this one motherboard, there's a lot more slack than "officially" reported.


Having said all that, I'm beginning to lean towards 'this approach is more work than it's worth'. Given that it's a personal project I have the liberty of saying "sod that" and moving to a different solution. I'm still planning on using an XMOS to do the signal decoding within the enclosure, but I think I'm just going to use a high-density cable and make the interface card just be a bus-transceiver/line-buffer to get the signals to their destination (the expansion box). At 1.79MHz I don't need to worry about LVDS or anything like that so a single-ended solution will do.

I've been looking at VHDCI cables, and a 68-way one (which actually gives me more options is ~$26 at Monoprice. The PCB connectors are ~$4 each, so for $34 (which is less than my BOM for the interface card) I can have a far simpler solution that actually does more ...
  • I can pipe the audio line through, so the expansion box can source/sink audio as well. That wasn't going to be possible on the purely-digital approach. Not a huge win, but a win nonetheless.
  • I have 68 pins and I need 40 for the system-bus, so I can put the CIO signals on there as well (CIO, for 'central i/o' is precursor to a USB style interface. Designed by the same guy, and let Atari plug everything from a modem to a disk-drive into a single port). Then people can design either CIO or parallel-bus expansion cards. That's actually a pretty big win.
It's not as "cool" as the Xlink idea, and it's won't look quite as nice with a big cable and longer connector, but it's perfectly acceptable.
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

What length of the VHDCI68 cable were you thinking of using?

1.79 Mhz of 5 volt / 3 volt signals over copper is risky on signal integrity. LVDS or RS422 style of differential drivers should be considered to properly transfer data over copper. We have many designs using RS422 and RS485 transceivers, latest ones are supporting 20 Mbps of 3v3 RS422 traffic without issues over copper. TI has some nice parts to consider. Alternatively back to your XMOS SERDES concept of XMOS @ one side of the cable -> LVDS signals over copper -> XMOS other side of cable to rebuild the traffic to your target interface.

We have used VHDCI68 connectors for many years on our low profile PCIe designs and surprised by the $4 cost - can you share which connector you have in mind? Personally attempted to negotiate a supplier in Taiwan during a past visit and they shot me down on my target of $3 USD in 10k lots. Stated our target was impossible to support. Since then, we do have a very good negotiated price with Tyco for the following connector and can say we pay < $ 5 USD. If you require a few, let us know. Digikey pricing is around $20 USD for this component.

The connector is delicate but built much better than some of the offshore clones of the same. We OEM to many clients and they can tell the quality difference of these connectors as compared to our competition. Even the screws for this part are kinky as well and sell at $ 1 USD each in volume from TE. Openly, we had the M2 thread screw cloned by a manufacturer in China due to the tremendous cost differences.

Tyco (TE) # 5796055-2

Image

There is another possible idea is to forget the LVDS interface, although it should be suitable for this design and consider one of the Avago HFBR (versatile link) or Firecomm equivalent fiber optic transmitter and receiver pairs. Then you would simply apply UART IP inside your XMOS ends and stream the data over the UART interface. However, now the bottleneck would be the UART upper speed. Have seen that XMOS is able to support 10 Mbps for the UART interface but of course faster is better.

https://www.firecomms.com/redlink

FT50MHNR
FM50MHNR
SpacedCowboy
Experienced Member
Posts: 67
Joined: Fri Aug 24, 2012 9:37 pm
Contact:

Post by SpacedCowboy »

I'm thinking 16" or possibly 36" for the cable. I'd go shorter if I could, but it's a factor of what's available. The 36" one is cheaper ($26), the 16" one is $29

As far as signal integrity goes, I've ordered the cable and connector to try it and see. I actually ordered the $5.56 connector because the $3.64 one is on order, they don't have stock, but they seem to be pretty much identical, and I'm guessing the -CT means 'commercial temperature' whereas the one I ordered is the standard grade.

The straddle-mount suits me because I intend to have it panel-mounted and then have the PCB slot into the expansion socket on the motherboard. I'll be designing the size of the card to suit the exact dimensions of the case. We'll see how it goes, I might take you up on the connector offer, modulo how it goes with the cheap one :)

The original 1090XL expansion box used a 1m ribbon cable, and there wasn't any differential signaling there. I think, with short link lengths, and 5v signaling, it ought to be ok. At the end of the day, I could always fall back to this - I know it works, after all :) Bear in mind that the original parallel ATA bus was just a standard ribbon cable, and that cycled at ~120ns or 8.3 MHz, which is much faster than what I'm targeting. Single-ended SCSI went as high as 20MHz, although there were a *lot* of ground wires in those cables :)

I thought about using fiber-optics (hell, I even considered repurposing TOSlink but TOSlink is too low bandwidth for reasonable prices - the optics are fine, the transceivers are generally only specced as far as they need to go and the higher they're specced the more expensive they get). The Avago ones also seem to be a bit too pricey for this project. The connectors above are pretty much the top end of what I'm willing to spend

I've looked at FFC/FPC, SFF-8087, High-density D-sub, scsi-2 etc. etc. So far, the best price/performance seems to be VHDCI when you take into account the cost of connectors and cable as a whole. I'm trying to simultaneously optimize cost, footprint, and signal-integrity... We all know how that goes... :)
Post Reply