Overhead of select and interface notifications.

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
RedDave
Experienced Member
Posts: 77
Joined: Fri Oct 05, 2018 4:26 pm

Overhead of select and interface notifications.

Post by RedDave »

I am developing some code to read a serialised bit stream over an LVDS interface. The minimum clock rate is 10MHz, giving me 100ns to read in each bit. I have an eXplorerKit for development, running at 500MHz.

There is a FRAME line which goes low when the device starts transmitting data, and then does high after 4 clock cycles. The data sent is 32 bits and clocked by LCLKIN. In this example data are transmitted on SD01 and SD03, one of them is inverted, presumably due to a wiring error.

Image

LCLKIN is generated on the XMOS and I am using it to clock in the data into a buffer.

All this basically works. Evidenced by SD03 being echoed in a lagged form onto p_debug2 when p_debug2 <: sd03; is uncommented.

My problem is this... I have a simple interface to get the data out of the task, it does not yet have a client [hence gives me a warning in main]. If I uncomment the notifier i_tdc.data_ready(); or the get_data case then the whole thing stops working. This implies that there is significant overhead in the notification call to data_ready(), which I thought was supposed to be a very efficient means of pinging a outside tasks. Also that there is an overhead in having a second case statement in my select, even when that case is never hit.

Am I missing something?
Is there a better method for getting the data out of this task in a timely manner that does not cease up this task?

Code: Select all

typedef interface tdc_if
{
    [[clears_notification]] int get_data();
    [[notification]] slave void data_ready();
} tdc_if;

on tile[0] : in buffered port:4 p_comms0 = XS1_PORT_4C;
#define FRAME1  (comms0 & 0x01)
#define SD03  (comms0 & 0x04)

on tile[0] : out port p_debug = XS1_PORT_1E;
on tile[0] : out port p_debug2 = XS1_PORT_1F;

#define REF_WIDTH   (12)
#define STOP_WIDTH  (20)
#define COMMS_WIDTH (REF_WIDTH + STOP_WIDTH)

void tdc_task(server tdc_if i_tdc,
            client spi_master_async_if spi,
            out port p_refclk, clock clk_refclk,
            out port p_lclkin, clock clk_lclkin,
            in port p_lclkout)
{
...
    configure_clock_rate (clk_lclkin , 100 , 10);
    configure_port_clock_output (p_lclkin , clk_lclkin);

    configure_in_port(p_comms0, clk_lclkin);

    start_clock (clk_lclkin);

    int comms_index = -1;
    int comms0;
    int comms_input;
    int count = 100;
    int sd03;
    int data;

    while(TRUE)
    {
        select
        {
            case p_comms0 :> comms0:
                sd03 = (SD03 ? 1 : 0);
                p_debug2 <: 1;
//                p_debug2 <: sd03;
                if (comms_index == -1) // Waiting for frame
                {
                    if (FRAME1 == 0)
                    {
                        p_debug <: 1;
                        data = comms_input;
//                        i_tdc.data_ready();
                        comms_input = (sd03 ? (1<<(COMMS_WIDTH-1)) : 0);
                        comms_index = COMMS_WIDTH-2;
                        p_debug <: 0;
                    }
                }
                else
                {
                    comms_input |= (sd03 ? (1<<comms_index) : 0);
                    comms_index--;
                }
                p_debug2 <: 0;
                break;
 /*           case i_tdc.get_data() -> int x:
                x = data;
                break;*/
        }
    }
}
Attachments
scope_3.png
(14.93 KiB) Not downloaded yet
scope_3.png
(14.93 KiB) Not downloaded yet


User avatar
CousinItt
Respected Member
Posts: 360
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

I don't think there's a problem with timing overhead - or not yet anyway. If you don't have a client for the interface the server will lock up. Interfaces functions are not just send-and-forget operations - each one is a transaction between server and client.

In principle the notification should not involve the client, but maybe the client has to set up the switch fabric so that the notification is available to it, or something like that, so that without the interface being declared on the client side the server will be blocked by calling data_ready(). Maybe someone else here can clarify that point.

Try writing a very basic client that just responds to the ready event and then calls get_data() and chucks the data away. Your server should then operate with data_ready() and get_data() not commented out. If you want to evaluate the timing overhead, you can bracket the calls in the server task with test pin set/reset operations. They should be quick enough. If they aren't, you have the option of using a streaming channel instead.
User avatar
RedDave
Experienced Member
Posts: 77
Joined: Fri Oct 05, 2018 4:26 pm

Post by RedDave »

I've added a test client task (below). With this wired in, I still get the same effect. All works correctly on the 'scope, if I add a get_data() case to the select then the output trace just loses sync. This is a case that is never being called since it still fails in this way with the i_tdc.data_ready() call still commented out.

Code: Select all

void tdc_test_client(client tdc_if i_tdc)
{
    int i = 100000 - 100;
    printf("tdc_test_client\n");

    while(TRUE)
    {
        select
        {
            case i_tdc.data_ready():
                int x = i_tdc.get_data();
                if ((i++) % 100000 == 0)
                {
                    printf("[%08X]\n", x);
                }
                break;
        }
    }
}
User avatar
CousinItt
Respected Member
Posts: 360
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

I meant something like

Code: Select all

void tdc_test_client(client tdc_if i_tdc)
{
    while(TRUE)
    {
        select
        {
            case i_tdc.data_ready():
                break;
        }

        i_tdc.get_data();
    }
}
... the printf could hold up the client for a while, expecially if you're not using xscope.

I noticed you're using a very short buffer for the port, which won't give you much time for data exchange. This might help, but you'll have to rejig some code (e.g. in the client) to cope with more than one sample coming in at a time.

Code: Select all

on tile[0] : in buffered port:32 p_comms0 = XS1_PORT_4C;
User avatar
RedDave
Experienced Member
Posts: 77
Joined: Fri Oct 05, 2018 4:26 pm

Post by RedDave »

The printf is only running once per second (or two), so if things were running correctly I would expect the 'scope to show good results and then 'skip' during each printf.

With using a buffered port...
The case p_comms0 :> comms0: block would be hit every 4 clock cycles. Correct?
And would contain 4 port readings, with the first (oldest) in the most significant nybble down to the last (most recent) in the least significant nybble (bits 3-0). Correct?

I will try that now. I tried that previously, but other things have changed since.

I am currently trying volatile unsafe shared memory to get the data out, but that is giving similar problems.
User avatar
CousinItt
Respected Member
Posts: 360
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

If you're using a four-bit port and a four-bit buffer, you will only have a buffer depth of one four-bit sample. Increasing the buffer size to 32 bits will give you a depth of 8 samples. If you don't need to use a four-bit port, using a one-bit port will allow you a buffer depth of 32 samples, increasing the time available for transfer out of your receiving task.
User avatar
CousinItt
Respected Member
Posts: 360
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

Have you seen the document "XS1 Ports: use and specification"?
User avatar
RedDave
Experienced Member
Posts: 77
Joined: Fri Oct 05, 2018 4:26 pm

Post by RedDave »

My maths failed 32/4 = 8 not 4.
I need to clock in multiple data, currently two, which will increase later.

But am I correct in saying that the most significant nybble will contain the earliest data?
User avatar
CousinItt
Respected Member
Posts: 360
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

The uppermost bits are the ones most recently clocked in. See that doc.
User avatar
RedDave
Experienced Member
Posts: 77
Joined: Fri Oct 05, 2018 4:26 pm

Post by RedDave »

I had missed the use and spec doc. I've read 4 others.

-

I am coming to a conclusion. I have optimisation -O2 turned on. When I am not running i_tdc.data_ready() and not copying out into volatile memory, then I am doing nothing with the data that I generate from the incoming stream. It gets optimised out. So actually the code that I thought was running in the necessary time period isn't. When I add an observer, it can no longer be optimised out and all fails.
Post Reply