lib_spi transfer delays

kyzyl · Post by **kyzyl** » Wed Aug 03, 2016 10:48 pm

Hi all. I'm attempting to use lib_spi to talk to a serial ADC (ADS8363[1]). For now, I'm just running on the simulator. I am trying to run the ADC as close to its peak rate of 1MSPS as I can. This means running the ADC at its maximum clock rate of 20MHz, and to achieve maximum throughput at this frequency the data needs to be read out in 20bit words back-to-back. Initially I set out to use buffered ports to manually generate exactly the right waveforms--very similar to the internals of lib_spi--however I didn't have much success getting it to work just right so I moved to lib_spi. Since lib_spi only supports 8- and 32-bit transfers, I had thought of using three 8bit transfers or one 32bit, which will result in 4 or 12 cycles of dead time, respectively. The resulting readout rates would be acceptable to me (i.e. I don't need the peak 20bit back-to-back throughput). To avoid external oscillators or mucking with the reference clock (for now) I am simulating with a serial clock of 25MHz, which forces me to use the async master mode (even though a synchronous interface would be sufficient I think).

My problem: Even though the transfers appear to work correctly, I get long waiting periods at the beginning and end of each transfer, and inbetween each transfer. After the chip select goes low, there appears to be a 32 cycle dead time (buffer clearing or something?), then my transfer, and then another slightly longer dead time, and then then chip select goes high. Then there is the inter-transfer delay. However, this period appears to be bottlenecked by something other than end_transaction, and sits at about 3.4us. Where are these delays coming from? Simulation waveforms attached (From top to bottom: CS, SCLK, MOSI, MISO).

Here is the code. The simulation is just outputing a 32bit test pattern for verification purposes. For the moment the control signals of the ADC aren't being simulated, it just outputs a 32bit word synchronous to the SCLK. Likewise, the MOSI transfer data are just a recognizable test pattern.

Code: Select all

void slave_simulation()
{
    configure_clock_src(clk_1, p_clk_in_sim);
    configure_out_port(p_data_1, clk_1,0);
    configure_out_port(p_data_2, clk_1,0);
    configure_in_port(p_trigger, clk_1);
    start_clock(clk_1);

    while(1)
    {
        p_data_1 <: 0x000AAAAA;
    }
}

void app(client spi_master_async_if spi)
{
	uint32_t outdata[1];
	uint32_t indata[1];
	uint32_t * movable buf_in = indata;
	uint32_t * movable buf_out = outdata;

	buf_out[0] = 0xcccccccc;
	spi.begin_transaction(0, 25000, SPI_MODE_3);
	spi.init_transfer_array_32(move(buf_in), move(buf_out), 1);

	while(1){
		select{
			case spi.transfer_complete():
			    spi.retrieve_transfer_buffers_32(buf_in, buf_out);
			    spi.end_transaction(10);

			    spi.begin_transaction(0, 25000, SPI_MODE_0);
			    spi.init_transfer_array_32(move(buf_in), move(buf_out), 1);
			    break;
		}
	}
        _exit(0);
}

int main()
{
    spi_master_async_if i_spi[1];
    par
    {
        on tile[0]: spi_master_async(i_spi, 1, p_sclk, p_mosi, p_miso, p_ss, 1, clk0, clk1);
        on tile[0]: slave_simulation();
		on tile[0]: app(i_spi[0]);
    }
    return 0;
}

Side note: Is there any decent way to run a continuous stream of transfers indefinitely, inside a single transaction? The only device on this SPI bus is the ADC, and it requires that CS stay low to maintain high throughput anyhow so it would be much more ideal to open a transaction and then perform transfers indefinitely, rather than opening and closing many transactions. According to the state machines in the docs, this undefined behavior. It does work in synchronous mode, but throws a runtime exception in asynchronous mode. I assume this has to do with synchronous transfers not using movable pointers. If there's a way better way to generate the timing diagrams in the datasheet without all this then by all means let me know. Thanks!

[1] http://www.ti.com/lit/ds/symlink/ads7263.pdf

xsamc · Post by **xsamc** » Thu Aug 04, 2016 11:59 am

Hi kyzyl,

The delays you're seeing at the start and end of each transfer are likely to be related to the use of the interface calls in the lib_spi API. The interface based API provides a great mechanism for safely sharing a SPI bus between many tasks running in parallel, but this does sometimes introduce overheads (getting/releasing locks etc.) which currently may not be optimised away when they're not required.

It sounds like performance is your primary focus, so you might prefer to use a function based SPI library - I put one together as side project, and while it isn't an official XMOS library and doesn't have documentation or a set of regression tests, it is hopefully enough to get you going in the right direction - see https://github.com/samchesney/lib_spi_fast.

I've had it running successfully at 25MHz on hardware.

Cheers,
Sam

kyzyl · Post by **kyzyl** » Fri Aug 05, 2016 8:47 pm

Hi Sam,

Thanks for the reply. I took your code for a spin, and it seems like it might be working. Unfortunately I can't use it on the simulator, because for some reason I get preposterously long delays between CS going low and the transfer beginning--like 1-5seconds (could it be non-native zip/unzip instructions?). When I run in hardware it seems to run fast, but I don't have a logic analyzer handy so it's difficult to tell. XSCOPE doesn't appear to be very helpful as a logic analyzer. I used the test in the repository as a basis for my code (Note: there is a bug in that code, the prototype for spi_fast does not match the call, it's missing the 4th argument spi_direction_t.)

Code: Select all

spi_fast_ports spi_ports = {
    on tile[0] : XS1_PORT_1C, // sclk
    on tile[0] : XS1_PORT_1A, // miso
    on tile[0] : XS1_PORT_1D, // mosi
    on tile[0] : XS1_PORT_1B, //cs
    0,
    on tile[0] : XS1_CLKBLK_3,
    1,
    0,
    1
};

spi_direction_t dir = SPI_READ_WRITE;
#define BUFLEN 1

void app()
{
    spi_fast_init(spi_ports);
    char buf[BUFLEN];
    unsigned int sa_miso, sa_mosi, sa_cs, sa_sclk;

    while(1){
        buf[0] = 0xAA;

        spi_fast(BUFLEN, buf, spi_ports, dir);
        printhexln(bitrev(buf[0]) >> 24);
    }

}

int main()
{

    par
    {
      on tile[0]: app();
    }
    return 0;
}

I'm curious, though, how hard would/should it be to manually implement a timing diagram like Figure 34 (in the datasheet I linked) using buffered ports? Naively I thought it should be a piece of cake with an xmos device, but on my startkit I've yet to be able to get it working properly. There always seems to be some off-by-one bit shift or the timing between signals drifts or is skewed. I assume this is just because I'm new to XC and don't quite understand the idiomatic way to do things yet. To me it seems like the 'bit banged' approach would be better considering the 20-bit word, back-to-back continuous 20/40MHz operation. How would you do it?

xsamc · Post by **xsamc** » Mon Aug 08, 2016 10:50 am

Hi kyzyl,

I'm not sure what would cause the delay you're seeing on the simulator at the start of the transfer, however there is a 10 microsecond pause before, and a 5 microsecond pause added after the CS is raised at the end of the transaction. These delays improved the reliability of the device I was using the library with, and may not be needed, or may need adjusting for your ADC.

Thanks for pointing out the bug in the test app btw, I've pushed a fix.

It should be easy to implement a SPI master using the ports, providing your device requires the "right" SPI mode, and your device is connected with 1bit ports. Here's a small example which makes full use of the port logic to implement SPI - it's untested though as I don't have a board with CS on a 1bit port to hand:

Code: Select all

// Super fast prototype, only works with CS on a 1-bit port
void spi_fast_init(spi_fast_ports &p) {
    stop_clock(p.cb);
    configure_clock_ref(p.cb, 1);
    configure_port_clock_output(p.clk, p.cb);
    set_port_inv(p.clk);
    configure_in_port(p.miso, p.cb);
    configure_out_port_strobed_master(p.mosi, p.cs, p.cb, 0);
    start_clock(p.cb);
}

void spi_fast(unsigned bytes, char buffer[], spi_fast_ports &p) {
    p.mosi <: buffer[0];
    for(unsigned i=1;i<bytes;i++){
        p.mosi <: buffer[i];
        p.miso :> buffer[i-1];
    }
    p.miso :> buffer[bytes-1];
}

Cheers,
Sam

lib_spi transfer delays

lib_spi transfer delays

Re: lib_spi transfer delays

Re: lib_spi transfer delays

Re: lib_spi transfer delays