Port output: "dead" time during transmit? Topic is solved

Technical questions regarding the XTC tools and programming with XMOS.
DemoniacMilk
XCore Addict
Posts: 191
Joined: Tue Jul 05, 2016 2:19 pm

Port output: "dead" time during transmit?

Post by DemoniacMilk »

On benchmarking a module I wrote, I ran into something i cannot explain:

What the module does:
  • data packet processing, and afterwards (in the same thread)
  • output packet onto ports (two data signals) on 8.3 MHz port clock
  • 1500 bytes / packet in the benchmark
measuring the time with a timer shows that
  • data processing takes about 1.3 ms.
  • data output takes about 8.6 ms.
Code for data output and time measurement:

Code: Select all

        
    #if BENCHMARK
    benchmarkTimer :> uiDurProc;
    #endif

    // output the first data set
    portOutfsync <: 0 @ uiTimestamp;                // 8 bit buffered, do a dummy output, time stamp the output
    portOutfsync @ uiTimestamp + 8 <: uiFSync;      // .. and then output fsync and
    portOutData  @ uiTimestamp + 9 <: uiaZipped[0]; // .. data, one clock cycle delayed, 32 bit buffered, 

    for (i = 1; i < offset / PORT_DATA_LINES - 2; i++){
        uiFSync ^= 0x1;
        portOutfsync <: uiFSync;        // this will block until the previous fsync has been transmitted
        portOutData <: uiaZipped[i];    // this will block until the previous data  has been transmitted
    }
    portOutfsync <: 0;                  // end of transmission, dont send another fsync!
    portOutData <: uiaZipped[i];

    #if BENCHMARK
    benchmarkTimer :> uiDurSent;
    printf("send:%u us\n", (uiDurSent-uiDurProc)/100);
    #endif
prints
...
send:8584 us
send:8584 us
send:8584 us
...
I dont understand why the data output takes this long.
In theory, on 8.3 MHz clock speed, outputting 1500 bytes on 2 parallel lines (or rather: 1500 bytes + 1500 times 0x00 on a 4 bit port) should take
(1500 Byte * 8 Bits/Byte) / (8,333,333 Hz * 2 Bits) = 720 µs
Mesuring the length of a packet with a scope confirms: packet start to packet end takes 720µs in total.

The function for sending a packet is called periodically, with a delay of 600 ticks (6µs) from returning to being called again.
So when BENCHMARK is 0 (-> no printf) I'd expect a packet to be sent every 2ms (1.3ms of processing + 0.7ms for sending). However, the scope shows a packet is being sent every 9.9 ms (fits to 1.3+8.6 ms).

I am severly confused. Where does the time difference of 8.6ms to 0.7ms come from?


EDIT:
I measured the timen eeded to run through the for loop, and changed the for loop to run just once:

Code: Select all

        // output the first data set
    portOutfsync <: 0 @ uiTimestamp;                // do a dummy output, time stamp the output
    portOutfsync @ uiTimestamp + 8 <: uiFSync;      // .. and then output fsync and
    portOutData  @ uiTimestamp + 9 <: uiaZipped[0]; // .. data, data one clock cycle delayed

    #if BENCHMARK
    benchmarkTimer :> uiDurProc;
    #endif

    //for (i = 1; i < offset / PORT_DATA_LINES - 2; i++){
    for (i = 1; i < 2; i++){
        uiFSync ^= 0x1;
        portOutfsync <: uiFSync;        // this will block until the previous fsync has been transmitted
        portOutData <: uiaZipped[i];    // this will block until the previous data  has been transmitted
    }
    #if BENCHMARK
    benchmarkTimer :> uiDurSent;
    printf("send:%u us\n", (uiDurSent-uiDurProc)/100);
    #endif
    portOutfsync <: 0;                  // end of transmission, dont send another fsync!
    portOutData <: uiaZipped[i];
The output is:
..
send:7864 us
send:7864 us
..
So running the loop a single time takes 7.8 ms?!
I output data right before and right after the for loop (if !BENCHMARK) and there are no gaps in the data transmit. How can the for loop need 7.8ms, while the whole transmit is done within tens of microseconds with no time gaps!? Replacing the for loop with a while had no effect.


View Solution
DemoniacMilk
XCore Addict
Posts: 191
Joined: Tue Jul 05, 2016 2:19 pm

Post by DemoniacMilk »

i was able to fix this by changing

Code: Select all

    // output the first data set
    portOutfsync <: 0 @ uiTimestamp;                // do a dummy output, time stamp the output
    portOutfsync @ uiTimestamp + 8 <: uiFSync;      // .. and then output fsync and
    portOutData @ uiTimestamp + 9 <: uiaZipped[0]; // .. data, data one clock cycle delayed
to

Code: Select all

    // output the first data set
    portOutfsync <: 0 @ uiTimestamp;                // do a dummy output, time stamp the output
    portOutfsync <: uiFSync;      // .. and then output fsync and
    portOutData @ uiTimestamp + 9 <: uiaZipped[0]; // .. data, data one clock cycle delayed
or to

Code: Select all

    // output the first data set
    portOutfsync <: 0 @ uiTimestamp;                // do a dummy output, time stamp the output
    portOutfsync @ uiTimestamp + 9 <: uiFSync;      // .. and then output fsync and
    portOutData @ uiTimestamp + 10 <: uiaZipped[0]; // .. data, data one clock cycle delayed
Reasoning behind this can be found in this topic.