SDRAM on XGC/Xshell board or LS1 device

DrFingersSchaefer · Post by **DrFingersSchaefer** » Mon Jan 25, 2010 6:25 am

OK

Helpful links, it's a bit old school but consider this diagram. The main parts of intrest are in the dotted box lower right hand side :-

http://en.wikipedia.org/wiki/File:Intel_8085_arch.svg

The 8085 is the first microcontrol system I built using wire wrap and a calculator keyboard and LED display to program it in machine code. (the tutor at tech college gave us the design and a monitor program in eprom)

Basicaly if you have a 32 bit port, you use bits 32, 31 & 30 as WR, RD and ALE respectiveley. leaving you with 29 address lines of which the lower 16 bits are multiplexed with your 16 bit data bus.

You would use discrete logic chips for the latches (74hc????).

Using this information you would be looking for devices that fitted within 29 bit address space and had a Data bus of 16 Bits.

For our 32 bit processor in this case you would do 2 fetches for a 32 bit word.

The sequence to program would run something like.

Write address plus assert ALE.
De-assert ALE
Toggle RD or WR bit acording to what you wante to do.
Read/Write the 16 Data value to/from the RAM device.
De-assert which ever RD or WR signal you had asserted by togling it.

Repeat above for second 16 bits of the 32 bit value you were working on.

Ta da, 29 bit address space, 16 bit data word from a 32 bit port.

Hope this helps

You would be looking towards PSRAM purely for capacity/price and 16 bit data bus.

kster59 · Post by **kster59** » Mon Jan 25, 2010 10:12 am

I was wrong about the port precedence previously but the SDRAM code is still not drag and drop to function on the LS1-128 as far as I can tell.

I read through the code pretty carefully for a few days but the port design is deeply buried in the code.

Consider the following:
on stdcore[1] : out port p_sdram_clkblk = XS1_PORT_1H; // This port must be unused!
on stdcore[1] : out buffered port:4 p_sdram_gate = XS1_PORT_1I; // This port must be unused!

From what I remember when I looked at it before, the code generates a dummy clock signal which acts as a clock in to another port. These two lines are not listed in the port map but you need to have them free or it wouldn't work.

Also, reading through the code:
set_port_strobed(p_sdram_addr0);
set_port_slave(p_sdram_cmd);
set_port_slave(p_sdram_dq);
set_clock_fall_delay(b_sdram_io, 9);
set_clock_rise_delay(b_sdram_io, 13);

and associated commands are not listed in xcuser_en.pdf nor portsXS1.pdf

There is a short mention of these commands in:
C:\Program Files (x86)\XMOS\DesktopTools\9.9.2\doc\libs\html\index.html

but not really enough to fully grasp what's going on in the code.

Now there's the issue of the asm.S assembly file without any comments. It's not that long and guess if I were really devoted then I could figure it out. I already know 8051 and x86 assembly but the XMOS asm instructions are quite different and it'd be some effort to learn them.

I am an advanced C/C++ programmer but I spent enough time to realize this SDRAM thing on the L1 wasn't going to be trivial.

SDRAM is one of those things that you really don't want to know how it works. In an ARM, you just connect it and it appears as an address that you can write to. In an FPGA softcore like a Xilinx Microblaze or Altium TSK3000, you just pick a Wishbone SDRAM controller and drop it down in your design and it also just appears as extra memory.

If anyone get's it working, I'd love to check it out. Would simplify a design I'm doing by using a single L1 core instead of a Spartan 3e which I use only for SDRAM control.

I really enjoy the ease I can program some special protocols and timings with the XMOS and Xc. It's much easier than writing it in VHDL but some of these commonly used low level protocols really should be rewritten into easily used object .Xc files to avoid reinventing the wheel.

Post by **TonyD** » Mon Jan 25, 2010 11:50 am

DrFingersSchaefer wrote: Basicaly if you have a 32 bit port, you use bits 32, 31 & 30 as WR, RD and ALE respectiveley. leaving you with 29 address lines of which the lower 16 bits are multiplexed with your 16 bit data bus.

You would use discrete logic chips for the latches (74hc????).

In my days using the 8051 micro we used a single 74LS373 to multiplex the address and data bus. So you would need two for a 16-bit bus unless there is a 16-bit equivalent of the 373.

DrFingersSchaefer · Post by **DrFingersSchaefer** » Mon Jan 25, 2010 3:59 pm

Thanks for that TonyD,

I was writing that piece off the cuff at silly o'clock in the morning and couldn't for the life in me remember what the designation for the latch was.

Quite right you will need two for 16 bit.

29 bit (1FFFFFFF) should give you an addressable space somewhere in the region of :-

536,870,911 16 bit words

If you used the 29th bit as a lazy decode/chip select to two devices. that would give you two lumps (Technical term) of half that figure.

In short more than you would probably need for an embedded device.

The real issue is'nt actually with the port or dedicating at least one thread of the attached core as a hardware memory interface driver. Nor is it how does one core or another get data from the mme thread/core. It is how to run the paging within each core.

Generally speaking we are talking here about paged memory, I don't think there is the support within the core hardware to generate the page faults/interrupts that are necessary to make paging work in a way that can appear to be transparent.

If there was a killer need to enable memory expansion and we wanted to do it with minimal changes to the silicon design as it exists, I feel adding this capability into the cores at hardware level would suffice in the short term.

I guess it is always desirable to put more memory on chip ( I would love to see the OTP replaced with a larger lump of FLASH and mapped into the cores memory space. It allows run in situ code to leverage the existing 64k ram. Doing this 64k flash would likely be plenty). But with the modifications above it should get us by for those odd needy cases that at the moment are likely to leave us needing to design in another processor and memory (with attendant cost's and effort).

That is unless someone who is really clever can come up with a workable and transparent paging scheme ????

kster59 · Post by **kster59** » Mon Jan 25, 2010 11:24 pm

I still don't know why the Xshell device shows the SDRAM schematic but the SDRAM code is nowhere implemented.

If the SDRAM driver code is available for the schematic with the Xshell then I'd build one immediately to try out but since it's not, I think the design risk is too great to justify the cost of development for the custom board when there's a free Wishbone SDRAM FPGA ip core for $5 Xilinx Spartan 3 parts which can act like a coprocessor to the XMOS and implement the free XLINK fpga core available.

I would think XMOS would make use of the SDRAM on the XShell if there weren't software issues.

If anyone has SDRAM working on any of the L1 chips, I'd love to check it out too.

Post by **TonyD** » Tue Jan 26, 2010 10:41 am

DrFingersSchaefer wrote: Quite right you will need two for 16 bit.
....

If there was a killer need to enable memory expansion and we wanted to do it with minimal changes to the silicon design as it exists, I feel adding this capability into the cores at hardware level would suffice in the short term.

I guess it is always desirable to put more memory on chip ( I would love to see the OTP replaced with a larger lump of FLASH and mapped into the cores memory space. It allows run in situ code to leverage the existing 64k ram. Doing this 64k flash would likely be plenty). ...

I've had time to look it up and the 16-bit version of the 373 is 74LVC16373 or LVT16373 or pick you favourite logic family and it should be there.

If a design only requires a smallish RAM expansion then there are various SPI based 32KB SRAMs such as Microchip's 23K256 and OnSemis's N25S830HA which could provide extra RAM space at a small cost.

I agree with you about the OTP. Replacing this with a decent sized 64K or 128K (or more) Flash memory would be ideal (for me ;) ). There was a discussion in this thread

Heater · Post by **Heater** » Tue Jan 26, 2010 11:11 am

I'm a bit out of touch with current memory technology but isn't it so that accessing FLASH slower than accessing from RAM. This would somewhat upset the deterministic timing of the xcores. As would having an external bus interface.

In that respect paging would seem to be a total no-no as we have then lost all determinism.

SPI RAMs are nice but of course much slower.

An external parallel RAM is what I'm looking at for getting a Motorola 6809 emulation running on xcores. even just a small SRAM or PSRAM would do.

Having got that, and emulation aside, it would be great to be able to execute some kind of byte code from external RAM with a small virtual machine in the xcore. This would allow for those "large" programs that do not need to be speed or real-time critical.

Anyone know of such a language/execution system that would fit in the 64K available in the xcore? And no, not Forth please. Something a bit more C or Pascal like.

kster59 · Post by **kster59** » Tue Jan 26, 2010 6:14 pm

Implementation of a streaming channel to an FPGA with a Xlink core and a standard SDRAM MMU is probably the easiest/cheapest solution. Then writing to the SDRAM can simply be done by copying memory into the streaming channel.

This is assuming you can program in HDL which I have noticed most people can't.

The ARM SoC with SDRAM MMUs are also very cheap (<$10). Writing an XLINK implementation on an ARM might be a good idea.

However, all this could be avoided by XMOS supplying improved SDRAM support :)

Heater · Post by **Heater** » Tue Jan 26, 2010 7:10 pm

There is something I don't quite understand there. If the idea is to provide RAM for an xcore by driving it from a FPGA on the end of an external link. Then what is the point of the FPGA? Why not just use an xcore instead of an FPGA?

Streaming anything is not much help when what you really want to do is execute code from that RAM space.

jhrose · Post by **jhrose** » Wed Jan 27, 2010 12:20 am

Hei,

Why not just use an xcore instead of an FPGA?

I can think of several problems.
1/ The xcore i/o clock rate is practically limited to 100MHz. Yes you can re-program the clocks and make it go faster (to about 167MHz was suggested on XLinkers at http://www.xmoslinkers.org/forum/viewto ... 219&p=1409) but with affects on the CPU clock speed (PLL stuff). With a modern DDR SDRAM (which is likely to be manufactured for a few years) you want the ability to clock at around 400MHz. FPGAs can work at these faster speeds.
2/ A directly CPU-addressable EMIF would introduce interference between the threads (making them non-deterministic or non real-time) whenever more than one of them needs to make an external RAM access at the same time (assuming you have a shared EMIF bus). The ability to access non-addressable external RAM through an external device (FPGA) would allow you to design non-determinism into your system, but keeps it out of the xcore. (BTW, I actually don't think paging is non-deterministic, rather it has a calculable maximum time overhead (in the order of a few milli-seconds, or at best several hundred micro-seconds) which could be reliably used by non-real-time threads.)
3/ You can use part of an FPGA itself as memory (allocating BRAM cells) which allows a system designer to calculate the cost/benefit in various memory configurations for different applications. You do not then write this cost into the xcore, when for many applications it will not be needed.
One downside is, of course, power consumption of an FPGA.

A key idea of xcores is to have a small (low-power low-cost) core which can be scaled up in multi-core chips, and those chips linked together into larger networked systems. As xcore programmers we need to design software which can be distributed across multiple-cores, rather than write monolithic programs destined for a single core. So if we wrote more functions with channel interfaces and less with parameter passing (call-stack) interfaces we would start to get better use of core-memory. A problem we have as system/software designers is to understand the cost of using conventional SDRAM versus the cost of networked xcores, and I don't think this is answered yet. Or?

SDRAM on XGC/Xshell board or LS1 device

Re: SDRAM on XGC/Xshell board or LS1 device

Re: SDRAM on XGC/Xshell board or LS1 device

Re: SDRAM on XGC/Xshell board or LS1 device

Re: SDRAM on XGC/Xshell board or LS1 device

Re: SDRAM on XGC/Xshell board or LS1 device

Re: SDRAM on XGC/Xshell board or LS1 device

Re: SDRAM on XGC/Xshell board or LS1 device

Re: SDRAM on XGC/Xshell board or LS1 device

Re: SDRAM on XGC/Xshell board or LS1 device

Re: SDRAM on XGC/Xshell board or LS1 device