Size of C++ functions too big

Technical questions regarding the XTC tools and programming with XMOS.
Bambus
Member++
Posts: 27
Joined: Tue Feb 28, 2017 12:52 pm

Size of C++ functions too big

Post by Bambus »

Hi all,

I want to call some C++ functions from xc but I cant get a build small enough to fit on my startkit taking in account the required stack size. The size of the C++ functions in the binary are way too big in comparison to xc functions that are equal or higher in computational complexity. I have made a screenshot of the function table from the tools in XTimecomposer:


Image

_Sfft_0 is the xc implementation of the FFT from the XMOS library. It has certainly higher complexity than pow or cbal. So what is going on? I am building with XTimecomposer with

XCC_FLAGS = -O3 -report -fxscope
XCC_CPP_FLAGS = -std=c++11

Thank you for your help!


User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Maybe not the answer you wish to hear but can you adjust the stack size ?

http://www.xcore.com/viewtopic.php?t=1204
Bambus
Member++
Posts: 27
Joined: Tue Feb 28, 2017 12:52 pm

Post by Bambus »

Thanks for your reply, mon2. The stack size is not the main problem as the code alone takes up 50kB of the available 64kB. As I need to hold a matrix of 512x6 imaginary 32 bit integers (24576 Bytes), there is no way I can solve this issue without compiling the C++ files in such a way that they take up less space.
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Another idea is to consider mapping an external SRAM device to the StartKit. For example, one single chip solution is to use the following which is a 64k x 8 SRAM which uses the SPI interface. Not sure how your project will be impacted by the use of the SPI interface which will slow down the R/W transactions to this external device but it is a viable solution. Pending any compiler optimization, here is one suitable SPI SRAM device for your consideration:

https://www.digikey.ca/product-detail/e ... ND/3543089

To make use of this device, you will need to insert the XMOS IP to create a SPI master and then perform the calls to the R/W routines to access this external SPI SRAM to expand your data storage.

There are other similar devices as well but this device states it can support standard SPI interface (slower but will take up less GPIO pins to interface) as well as QSPI mode (different IP will be required but is available for XMOS). QSPI mode will allow for higher throughput (as 4 pins are able to R/W with the memory in a single clock cycle) for the R/W transactions but will consume a few more interface pins.
Bambus
Member++
Posts: 27
Joined: Tue Feb 28, 2017 12:52 pm

Post by Bambus »

I am already using the XMOS SDRAM Slice (16 address/data pins) which is really fast. But since I need a full 512x6 matrix at any given time, it won't help me with my problem. Except there is a way to store currently unused parts of the code on the SDRAM. But I have no idea how to achieve that.
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

I am already using the XMOS SDRAM Slice (16 address/data pins) which is really fast.
Are you using this SDRAM Slice on with your StartKit ? From our last review, which may be now dated, the XMOS SDRAM Slice was not compatible with the StartKit. If yes, did you rework the original SDRAM XMOS code to make it work with the StartKit ?

Could you not access the external storage and R/W to each matrix location via the SPI routines ? Perhaps I am missing the concept but for example:

read[0] -> will perform the SPI serial routine fetch to the external SPI SRAM and return to caller
write[0] with xx value -> will perform the SPI serial routine to write to the external SPI SRAM

and so on..

Keep your executable code inside the StartKit Tile but store your matrix data inside the external SRAM. The penalty will be the SPI routine size and the time to perform the R/W with the external memory.
Bambus
Member++
Posts: 27
Joined: Tue Feb 28, 2017 12:52 pm

Post by Bambus »

Yes I do. It is not fully compatible, I had to edit and adapt the lib_sdram files for StartKit and I can only use half of the memory. But 4MB are still enough for what I am doing. I tried to adapt it such that I can use the full memory but I run into timing issues when trying to write data using 6 different ports to fuse them together to a 16bit width. The transmission of the address works with 16bits but the data transmission only with 8bits. But there should be a way to make it work properly at lower data rates.

That would be possible but it would slow down my routines so much that it would infringe on the real time functionality.

I also just found this document:

https://www.xmos.com/download/private/a ... rc0.a).pdf

I will try this first and give you feedback!
robertxmos
XCore Addict
Posts: 169
Joined: Fri Oct 23, 2015 10:23 am

Post by robertxmos »

Hi Bambus
I would also recommend building with -Os in general.
The -O3 flag also turns on -mdual-issue which does not do a good job at present, hence will fill the slots with too many nops.

If you really want -O3 you can also explicitly turn off dual issue with -mno-dual-issue.
For those functions that are time critical, you can add the function attribute (XC, C C++):
[[dual_issue]] void dual(void) {...}
There is a matching [[single_issue]] attribute too.
robert
Bambus
Member++
Posts: 27
Joined: Tue Feb 28, 2017 12:52 pm

Post by Bambus »

Hi Robert
I didn't know that dual issue was possible on XS1 chips. I changed -O3 to -Os and the .text section shrinked by almost 20K! Thank you for that! Now I only need to slice away ~1K from the stack and use overlays and hope that I don't run into problems during runtime.
robertxmos
XCore Addict
Posts: 169
Joined: Fri Oct 23, 2015 10:23 am

Post by robertxmos »

Hi Bambus,
> I didn't know that dual issue was possible on XS1 chips.
It aint - not paying enough attention :-)
robert
Post Reply