xtcp 6.0.0 performance Topic is solved

Sub forums for various specialist XMOS applications. e.g. USB audio, motor control and robotics.
Junior Member
Posts: 4
Joined: Tue Jan 23, 2018 5:42 pm

xtcp 6.0.0 performance

Postby tobias.xcore200 » Thu Feb 01, 2018 12:54 pm

Hi there again,

today I have a question about the performance of the xtcp LWIP stack.
As I already wrote in another thread I am currently writing my bachelor thesis and thus working with the TCP/IP stack of XMOS to implement different SSL/TLS libraries.
One of the parts of the thesis is the performance evaluation of the SSL/TLS libraries, so currently I am benchmarking them.
To be more precise, currently I am benchmarking the raw LWIP stack, so plain TCP only without any encryption.

My benchmark is working as follows:
Use the XMOS (eXplorerKIT), as the client, the server is hostet by a python script on my laptop (gigabit ethernet home network, no negative performance impacts).
My PC and my laptop are able to get data rates of 250 Mbit/s, so the script does not affect the xmos performance.
So, the first step is to connect the XMOS to the laptop tcp server, then the client (XMOS) sends 1024 times messages with different length (64, 128, 256, 512, 1024 bytes).
Just to clarify: Each message length is sent 1024 times, to we have a total of 5120 sent messages.
After the messages are sent, the python script calculates the time needed, the bytes received and the Mbit/s.

The results are as follows (data length, Mbit/s):
64 bytes: 0.7177576103839429
128 bytes: 1.3773611413365363
256 bytes: 2.8633973350361006
512 bytes: 5.490116417918642
1024 bytes: 0.20186308525218072

So, as you can see, the results start quite ok: We have a linear increase in the data rate when increasing the packet size, so the stack is scaling properly and it is the amount of messages, that are limiting the data rate.
But when the message length is 1024 bytes, then we have an extremely poor performance I can't explain. To examine this problem a little more, I made some tests and that's what I got:
535 bytes: 5.774755936146872
536 bytes: 0.11135052619139878

So, when sending messages with a length of 535 bytes everything is fine, but when using 536 bytes or more, then the data rate is so bad I don't event want to look at it.

I already testet the stack with the maximum message size of 1472 bytes (xtcp.h XTCP_CLIENT_BUF_SIZE) and the results are even stranger:
1472 bytes: 15.100920105627655

To find out whats going on I captured the benchmark using wireshark and thats what it captured:
(see the attached image at this point)

So, as you might guess, my laptop has the IP, the XMOS has IP
What surprises me is, that the message is split into parts (see the Len=1460 and Len=12 what makes 1472 bytes in total again). So, I don't know why the messages are split (even more suprising is that I got one message with 2920 bytes), but thats ok for me. What is more surprising is, that the performance is extremely well compared to the other message sizes. Well, I expected this result as 512 * 3 is only a little more than the 1472 I used, but why is the performance from messages sizes with 536 to 14xx bytes so bad?

I expected a linar growth in data rate with growing message length, but it seems like the stack is performing or behaving not like I expected.
Has anyone an idea why I am experiencing this behavior, how to further investigate or something like this?

Any kind of help is imporant,
View Solution
Junior Member
Posts: 4
Joined: Tue Jan 23, 2018 5:42 pm

Postby tobias.xcore200 » Wed Feb 07, 2018 9:30 pm

as noone is able to help me out, I figured it out myself by looking at the ethernet and lwip library code for some hours and doing some testing.

Actually this is not an XMOS error, its
a) a design issue in the xmos tcp client I created
b) a tcp protocol 'feature'.

To a):
My code looks like this:

Code: Select all

    i_xtcp.connect(p, tcpServer, XTCP_PROTOCOL_TCP);

    int i = 0;

    while (i < 1024) {
        select {
            case i_xtcp.packet_ready():
                i_xtcp.get_packet(tmp, rx_buffer, LEN, data_len);
                switch (tmp.event) {
                case XTCP_IFUP:
                    printf("IFUP ");
                    i_xtcp.connect(p, tcpServer, XTCP_PROTOCOL_TCP);

                case XTCP_NEW_CONNECTION:
                    printf("Connection established.\n");
                    i_xtcp.send(tmp, tx_buffer, LEN);

                case XTCP_SENT_DATA:
                    i_xtcp.send(tmp, tx_buffer, LEN);

Please pay attention to what happens after something is sent:
i_xtcp.send() -> i_xtcp.packet_ready() -> i_xtcp.get_packet() -> i_xtcp.send() -> ...
So, this means the code always waits for a new event occurring. The event waited for here is the ACK from the server. So, this means that the code always waits for an ACK before sending an other packet.
Keep this in mind for

https://tools.ietf.org/html/rfc1122#page-96 says:

A TCP SHOULD implement a delayed ACK, but an ACK should not
be excessively delayed; in particular, the delay MUST be
less than 0.5 seconds, and in a stream of full-sized
segments there SHOULD be an ACK for at least every second

So, setting up a tcp server on windows and executing the code above with LEN = 512 (remember, my other post said I should get about 5 Mbit/s...) endend up in something like 0.02 Mbit/s.
So, what happened?
Read this: https://support.microsoft.com/en-us/hel ... th-winsock
Little quote:
When a Microsoft TCP stack receives a data packet, a 200-ms delay timer goes off.

Wait what? B-But, Microsoft, pls. Yes. Wireshark showed me, that when a packet from XMOS was received the Microsoft Stack ACKed it exactly after 200 ms. So, we have a throughput of 5 packets per second! Wow, thats bad.
The Microsoft article I liked also gives some exceptions you might like: When a packet has a size about 1460+ Bytes, then it is ACKed immediately.
Same with the linux kernel stack, but there happens even some more stranger stuff I won't investigate.

Send more packets and don't wait for the ACK every time you send stuff OR only send Packets that contain more than 1461 Bytes of data ;)

Greetings to everyone.
Posts: 31
Joined: Wed Sep 08, 2010 10:16 am

Postby ozel » Mon Feb 19, 2018 12:42 pm

Hi Tobias,

very interesting, your findings. It's always amazing to discover these little implementation specific details that no one really seems to know about.
I'm currently fighting with xtcp as well to achieve highest possible UDP output rate from an xcore-200 eXplorer board. Therefore, I switched to the non-official 7.0 beta version which I found through a pending pull request in the XMOS repo: https://github.com/jh14778/lib_xtcp
There's a large number of improvements including some changes to the current API. It's not perfectly working yet for UDP using the lwIP stack in my case, but it might be just my application code.

For maximum performance, I'm considering to dig out old G4-chips-era code of my own that assembles a complete UDP packet and directly hands it over to MII/MAC channels with as little instructions as possible (no connection handling at all, just fire and forget).
It's really nice to see all the effort still going into lib_xtcp, unfortunately it seems it's still not 100% there yet for high rate transfers, especially rather small packets.
I hope Peter or Jake join in and enlighten us a bit about the the planned future and inherent limits of lib_xtcp, regarding the new lwIP stack etc.
For anyone interested in ethernet besides AVB/TSN on an xcore-200, a high performance TCP/UDP stack would be quite useful I suppose (and kind of needed to attract developers away from FPGAs and towards the XE216 chips...IMHO).

Tobias, did you profile UDP performance as well in your case? What are your final figures about TCP throughput with small/large packets?
Cheers, Oliver
XCore Addict
Posts: 151
Joined: Thu Nov 26, 2015 11:47 pm

Postby akp » Wed Mar 14, 2018 5:28 pm

Also note my finding here with respect to LWIP error ERR_RST not closing the connection. http://www.xcore.com/viewtopic.php?f=47&t=6494

Who is online

Users browsing this forum: No registered users and 6 guests