Running function in multiple cores (time penalty)

All technical discussions and projects around startKIT
Post Reply
rebb
Member
Posts: 9
Joined: Thu Jun 25, 2015 12:14 pm

Running function in multiple cores (time penalty)

Post by rebb »

Hello!

Just received my startkit awhile ago, and keen to start working on multicore stuff.
I have a function written in C which i like to run on multiple cores at same time, just for testing/timing purposes.

I'll check for button press and then start timer, run my function inside par{} four times and get end time.
Usual stuff, there is no difference running it only one core or four cores at same time. But if I increase it over four cores, I am getting time penalty for doing so.

Is this performance i should be expecting or am i doing something stupid here?

You can find Xc file from here: https://github.com/rebbTRSi/chipSounds/ ... rc/main.xc
and whole repo from: https://github.com/rebbTRSi/chipSounds


richard
Respected Member
Posts: 318
Joined: Tue Dec 15, 2009 12:46 am

Post by richard »

From the Product Overview -> Logical Cores section of the datasheet a XS1 device (https://www.xmos.com/support/silicon/datasheets):
The tile has 8 active logical cores, which issue instructions down a shared four-stage pipeline. Instructions from the active cores are issued round-robin. If up to four logical cores are active, each core is allocated a quarter of the processing cycles. If more than four logical cores are active, each core is allocated at least 1/n cycles (for n cores).
As a result the per core performance is constant if you are using between 1 and 4 cores on a tile and it decreases as you go beyond 4 cores.

All XS1 devices have a 4 stage pipeline. xCORE 200 devices have a 5 stage pipeline so if you were to run your example on a xCORE 200 device you would find that the per core performance is constant between 1 and 5 cores and decreases as you go beyond 5 cores.
User avatar
Bianco
XCore Expert
Posts: 754
Joined: Thu Dec 10, 2009 6:56 pm
Contact:

Post by Bianco »

Hi,

This is expected behaviour.
The cores on a tile are logical cores, not physical ones.
Each processor cycle the tile can start executing an instruction of a different core.

A core on a tile gets 1/4th of tile processing power when there are up to 4 cores active.
When more than 4 cores are active, each core gets 1/nth of tile processing power.

So with just 1 core it is not possible to utilize the full tile processing power.
You need at least 4 of them. When you have more than 4 cores that are CPU bound, you will see that the tasks will take longer to execute.
rebb
Member
Posts: 9
Joined: Thu Jun 25, 2015 12:14 pm

Post by rebb »

Thanks for the fast replies! All is good, back to the xtimecomposer then.
Post Reply