How to parallelize SDRAM server

If you have a simple question and just want an answer.
Thorsten
New User
Posts: 2
Joined: Mon Jul 14, 2014 8:36 pm

How to parallelize SDRAM server

Post by Thorsten »

Dear All,
I try to use the SDRAM module for my own SDRAM server and work very close to the SDRAM-slice project. The data to be read in comes every 1.6us and I sample via the 32-bit port of the XS1-L8A-64-TQ128-C5. This works fine if I use the internal SRAM. However, this is unfortunately too small so I decided to use the SDRAM as described in the SDRAM-slice documentation. I designed my own board and set up XtimeComposer project based on the SDRAM module. I parallelized my application, the SDRAM server and also the FAST-UART-Module required for data exchange:
    par {
      sdram_server(sdram_c, ports);
      meas_TX(sdram_c,uart_c);
      uart_tx_fast(p_tx, uart_c, tics);
    }
The data is written to the SDRAM via the sdram_buffer_write-function which comes with the module. Storing one integer with
          sdram_buffer_write (c_server, bank, row, 1, writebuff);
          //Wait until idle, i.e. the sdram had completed writing.
          sdram_wait_until_idle(c_server, writebuff);
 
takes around 1.7us (measured with an oscilloscope) and is thus too long. So I decided to collect 128 Integers and write them in one step, which takes (with sdram_buffer_write or sdram_full_row_write) around 10us, but of course only occurs every 205us. So there should be plenty of time. Unfortunately, the sdram_wait_until_idle-function stops the measurement-thread for 10 us, so that I miss some incoming data points – even if I integrate the functionality of this function directly to the write-routine. This, I cannot understand: To my opinion, parallelizing of threads should ensure that the storage is done on a different core while the measurement thread continues working.
My questions are:
 
1. How can I avoid, that the function sdram_wait_until_idle ruins the performance? I tried a lot of things (e.g. I moved the sdram_wait_until_idle–function before the write command in the loop to reduce the waiting time. Well, the performance is improved but unfortunately not good enough) and read through a lot of documentation, but now after a week, I have not any more ideas on my own.
 
2. Can I speed up the performance by making c_server a streaming chanend? These should be faster and do not need to be closed after usage.
 
3. Is it possible to let the SDRAM-server run on multiple cores? I have seen this in the benchmark-program, but am not sure if this will help. This is especially true, since the performance in the benchmark tests do not really vary with the number of used cores.
 
I appreciate any kind of help and hope to read from you soon.
Thorsten
 


User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am

Post by mon2 »

Hi Thorsten. Also reviewing the same code which is severely lacking proper and required documentation. More on that later after the closure of the review. Attempting to port the code over to the startKit.

Have you reviewed the app_sdram_demo project for the sliceKit ? It shows how to parallelize the same code. Also, the sdram_wait_until_idle is required to be placed after the WRITE or READ of the SDRAM to confirm the task is finished.

Have you attempted to write a smaller block of data ? That is, a value less than 128 integers ? Does that impact your project or time study ? What if you do not use the UART function ? Does that improve your SDRAM benchmark ?

The (poorly) documented code is priming the SDRAM for page writes and then performs a BURST TERMINATE opcode to the SDRAM. The BURST TERMINATE is in the assembler part of the code. Most of the actual reading and writing is performed in assembler via the ".inc" source files.

 

Kumar

Redeye
XCore Addict
Posts: 131
Joined: Wed Aug 03, 2011 9:13 am

Post by Redeye »

Thorsten,

I have good news for you - it is definitely possible. I've just done a very similar thing with implementing an audio delay for my DSP project which is moving more data to the SDRAM and at a much faster rate than you need. There's a couple of different ways to achieve this. I've used a mixture of C and XC to share memory between threads and lower the amount of data moving over channels, but you can probably just do it with XC by splitting into two threads something like this :

read_thread(chanend c_data)

{  

int buffer[128];  

while(1)  

{  

for(int i = 0; i < 128; i++)  

{ read data into buffer }  

for(int i = 0; i < 128; i++)  

{ c_data <: buffer; }  

}

}

write_thread(chanend c_data, chanend c_sdram_server)

{

   int buffer[128];

   while(1)

   {

      for(int i = 0; i < 128; i++)

      { c_data :> buffer; }

      sdram_write()

      sdram_wait_idle();

   }

}

I think that does what you want and should be fast enough. If it's not fast enough or if I've not understand your question right, let me know and I'll see if I can explain the shared memory method in C.

User avatar
andrew
Experienced Member
Posts: 114
Joined: Fri Dec 11, 2009 10:22 am

Post by andrew »

I have been working on a solution to this: a new sdram server. It can be found on github at 

https://github.com/xcore/sc_sdram_burst ... xperiments

It has examples and docs but is not ready for general release. Feel free to give it a go. It supports: 

* buffer read + write,

* one or more clients

* asynchronous command decoupling with a command queue of length 8 for each client 

  and now can go up to 62.5 MHz (or more with more mips)

If you do use it then feel free to feedback experiences.