SDRAM errors

If you have a simple question and just want an answer.
Redeye
XCore Addict
Posts: 131
Joined: Wed Aug 03, 2011 9:13 am

SDRAM errors

Post by Redeye »

I have a board in production which uses the same ISSI SDRAM with an identical schematic to the XA-SK-SDRAM slice card. About 85% of these boards work absolutely perfectly and reliably. However, I'm getting some boards which have write/read errors that I'm having real trouble getting to the bottom of. Continuity between the processor and SDRAM appears to be fine.

I've written a test function based on the sc_sdram_burst testbench code which shows several curious aspects of the problem :


#define TIMER_TICKS_PER_US PLATFORM_REFERENCE_MHZ
#define READ_DELAY 100

static unsigned makeTestWord(unsigned bank, unsigned row, unsigned word)
{
    return (bank + (row << (SDRAM_BANK_ADDRESS_BITS)) + (word << (SDRAM_BANK_ADDRESS_BITS+SDRAM_ROW_ADDRESS_BITS)));
}

static void address_test3(chanend c_server)
{
    unsigned buffer[SDRAM_ROW_WORDS];
    unsigned num_rows = SDRAM_ROW_COUNT;
    timer T;
    int time;

    for (unsigned bank = 0; bank < SDRAM_BANK_COUNT; bank++)
    {
        for (unsigned row = 0; row < num_rows; row++)
        {
            for (unsigned word = 0; word < SDRAM_ROW_WORDS; word++)
            {
                buffer[word] = makeTestWord(bank, row, word);
            }
            sdram_buffer_write(c_server, bank, row, 0, SDRAM_ROW_WORDS, buffer);
            sdram_wait_until_idle(c_server, buffer);

            T :> time;
            T when timerafter(time + (READ_DELAY * TIMER_TICKS_PER_US)) :> time;

            sdram_buffer_read(c_server, bank, row, 0, SDRAM_ROW_WORDS, buffer);
            sdram_wait_until_idle(c_server, buffer);

            for (unsigned word = 0; word < SDRAM_ROW_WORDS; word++)
            {
                if(makeTestWord(bank, row, word) != buffer[word])
                {
                    printstr("Failed address_test3 at bank 0x");
                    printhex(bank);
                    printstr(" row 0x");
                    printhex(row);
                    printstr(" word 0x");
                    printhex(word);
                    printstr(" - should be 0x");
                    printhex(makeTestWord(bank, row, word));
                    printstr(" , read 0x");
                    printhexln(buffer[word]);
                    break;
                }
            }
        }
    }
}

On most boards, this code runs with no errors. On a faulty board I get output similar to :

Failed address_test3 at bank 0x0 row 0x7E0 word 0x24 - should be 0x91F80 , read 0x91780
Failed address_test3 at bank 0x0 row 0xE27 word 0x56 - should be 0x15B89C , read 0x15B09C
Failed address_test3 at bank 0x2 row 0xFA4 word 0x0 - should be 0x3E92 , read 0x3692
Failed address_test3 at bank 0x3 row 0xFB0 word 0x2A - should be 0xABEC3 , read 0xAB6C3

Things I've discovered through experimentation with this function :

1. The longer I make READ_DELAY, the more errors I get. With READ_DELAY at 0 I get no errors even on a faulty board.

2. The errors don't happen at the same addresses every time I run the application

3. The errors always happen at a row with bit 0x0200 set and the error in the data is always a 0x0800 bit missing

But now I'm not quite sure where to go next. Point (1) above suggests to me that this is a refresh problem, but if that's the case then I don't understand why it would vary between boards.

Any suggestions of what to try next would be gratefully received as this one is baffling me.

Redeye
XCore Addict
Posts: 131
Joined: Wed Aug 03, 2011 9:13 am

Post by Redeye »

The part number I have used IS42S16400J-7TL. This is the updated version of the IS42S16400F-7TL part used on the XA-SK-SDRAM which is now EOL. As far as I can see on the datasheets there are no performance/timing differences between these two parts.

I don't have a slicekit to test the code with, but my test code is just a slightly modified version of the standard testbench code. It's not doing anything complicated or performance testing - just writing to a row, then a short delay, then reading the data back to check that it's the same. On about 85% of my boards it's absolutely rock solid and I've got dozens of these boards out in the field working absolutely fine, but on the remaining 15% it isn't. I don't really understand why as there's no obvious hardware fault and I'm not quite sure what to try next given the above test results.