XMP-64 performance experiments

XCore Project reviews, ideas, videos and proposals.
User avatar
Jamie
Experienced Member
Posts: 99
Joined: Mon Dec 14, 2009 1:01 pm

XMP-64 performance experiments

Post by Jamie »

Version: 1
Status: Complete
License: BSD
Download: http://github.com/jameshanlon/xmp64-per ... xperiments

This project presents all the source code used to obtain the performance measurements for the XMP-64 device. For a detailed explanation of metrics and methods used, take a look at the document itself.
The source code package for this project consists of three executable programs:

A 'ping' programs which measures message latency between pairs of nodes
A 'barrier' program which measures the time taken for all nodes to complete a barrier synchronisation.
A 'traffic' program which measures message latency under different traffic permutations

'Ping' and 'barrier' are both quite simple and can be run as is, but 'traffic' is a bit larger and due to various changes not a particularly polished piece of code!
Perhaps most useful is the 'common' directory which includes more general code, including:

timing functions written in assembly used in
global clock synchronisation over the hypercube
general hypercube functions such as min and max
a simple pipe communication structure



User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

I was reading the documentation. I earlier understood that the G - chips could interconnect with 4 other G chip, but now I understand that each core on the BGA512 G-chip can interconnect with 4 different cores - resulting in up to 16 links/chip meaning that up to 160 pins/balls can be used for intercommunication!? (If you don't boot from SPI-flash)
I didn't understand the documentation of the XMP64. How many of the maixum speed (1.6Gbps) link does it use? Every chip on the board doesn't communicate in 4D (with 4 other chips directly, using 4 link to each other chip?)
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
Jamie
Experienced Member
Posts: 99
Joined: Mon Dec 14, 2009 1:01 pm

Post by Jamie »

Each core is connected to 4 other cores with 5b links. Then, as I understand it, (from the xsystem manual) with a bit spacing of >= 2, you can achieve 800 Mbits/s. I guess if the bit spacing was 1, then you get the 1.6 Gbits/s quoted speed.
User avatar
dave
Member++
Posts: 31
Joined: Thu Dec 10, 2009 10:11 pm

Post by dave »

Data is transmitted on the 5-wire links using non-return-to-zero 1-out-of-5 coding. A single transition on one of the 5 wires encodes more than 2 bits - it takes just four transitions to send a token. This coding is explained in section 2.3 of the XS1 architecture document. The highest transmission rate on any pin is 100MHz or 200 million transitions/second - this allows a single 5-wire link to carry data tokens at 400Mbits/second. Actually, it is really a 10-wire link, carrying 400Mbits/second simultaneously in both directions.

The G4 chips have 16 of these 10-wire links connected to a single switch; this switch also has four internal links connected to each of the four XCores - it is a 32-way fully connected switch.

In the XMP64, the 16 chips are connected in an order-4 hypercube. Each G4 is directly connected to four other G4s. As there are 16 links available to do this, four links are used on each of these direct connections. The result is that the data rate on each connection is 4 * 400 Mbits/second - or 1.6Gbits/second - in each direction.