Boot and initialisation
From XCore Exchange
This page gives a brief description of the boot and initialisation process for single chip or networks of XS1 processors and how these may be replaced.
The boot and initialisation process is generated automatically by the mapper and included in the final multi-core program binary. It works in three phases, and it is possible to disable any of these enabling the use of a different method. Each of these phases are described below.
Network bringup
Network bringup is only necessary in systems with externally linked chips, such as the XMP-64, in order to establish node labelings and Xlink and routing table setup. When booting from JTAG, this phase is performed by loading an executing a separate binary (Loadable 1) for each core in the system. Once the execution of this has finished, the second program binary for each code (Loadable 2) is loaded and executed. When booting from flash the automatically generated second stage loader performs network bringup and loading. For more details see the XE file format section of the Tools Developer Guide.
The network bringup phase can excluded from a multi-core binary either by using xobjdujmp to manually construct it with ELF images for each core, or to extract it, again using xobjdump.
C/XC runtime startup
The runtime startup initialises the environment for C and XC programs to run, which includes setting up pointers, threads and channels. This is included in the program binary and is executed before the main function in the program. The inclusion of this can be disabled with the '-nostdlib' compiler flag. When this is disabled, the ELF symbol '_start' must be defined, as execution will start from that address.
Synchronisation
Channels, possibly over Xlinks, are used to ensure that all cores are in a known state and ready and some additional cleanup is performed. The top-level channel ends are allocated and the values communicated between cores. Then, on each core the correct top-level functions are called. This phase can be disabled by using the '--nochaninit' mapper flag.
Synchronisation is performed using a spare register in the switch (register number 3) of node 0. This is referred to at the scratch register. On power up the scratch register has value 0. When node 0, core 0 is ready to synchronise it writes 1 to it then polls, waiting for it to reach num_cores.
Each of the other cores is assigned a unique number in the range 1..num_cores-1 (let's call it boot_id) in the linker-generated code. At startup each core polls the scratch register waiting for it to reach boot_id. When it does, it sets it to boot_id+1.
In order to reach num_cores each of the cores must have detected its own number and written the next. Thus they are all known to have reached a known point in the boot.
