soft-reset sequence: for board of two XS1-L16-128?

Technical questions regarding the XTC tools and programming with XMOS.
jerryXCORE
Experienced Member
Posts: 65
Joined: Tue Apr 30, 2013 10:41 pm

soft-reset sequence: for board of two XS1-L16-128?

Post by jerryXCORE »

I do soft-reset on sliceKit (one XS1-L16-128) successfully with code:

Code: Select all

    unsigned int pllVal, tile_id;
    tile_id= get_local_tile_id();
    read_sswitch_reg(tile_id, 6, pllVal);

    if (tile_id == get_tile_id(stdcore[1])){ //if running from tile 1 then reset tile 0 first
          write_sswitch_reg_no_ack(get_tile_id(tile[0]), 6, pllVal);
          write_sswitch_reg_no_ack(get_tile_id(tile[1]), 6, pllVal);
    }
    else{ //othewise we're running on tile 0, so reset tile 1 first
          write_sswitch_reg_no_ack(get_tile_id(tile[1]), 6, pllVal);
          write_sswitch_reg_no_ack(get_tile_id(tile[0]), 6, pllVal);
    }
But I failed to soft-reset my own board(two XS1-L16-128) with code:

Code: Select all

    unsigned int pllVal, tile_id;
    tile_id= get_local_tile_id();
    read_sswitch_reg(tile_id, 6, pllVal);

    if (tile_id == get_tile_id(stdcore[1])){ //if running from tile 1 then reset tile 0 first
          write_sswitch_reg_no_ack(get_tile_id(tile[0]), 6, pllVal);
          write_sswitch_reg_no_ack(get_tile_id(tile[1]), 6, pllVal);
          write_sswitch_reg_no_ack(get_tile_id(tile[2]), 6, pllVal);
          write_sswitch_reg_no_ack(get_tile_id(tile[3]), 6, pllVal);
    }
    else{ //othewise we're running on tile 0, so reset tile 1 first
          write_sswitch_reg_no_ack(get_tile_id(tile[3]), 6, pllVal);
          write_sswitch_reg_no_ack(get_tile_id(tile[2]), 6, pllVal);
          write_sswitch_reg_no_ack(get_tile_id(tile[1]), 6, pllVal);
          write_sswitch_reg_no_ack(get_tile_id(tile[0]), 6, pllVal);
    }
any hint? Thanks!
hkr87
Member
Posts: 12
Joined: Tue Oct 13, 2015 11:36 am

Post by hkr87 »

It will depend on the connections between the xcores.

You will need four cases, and you will need to know how they are interconnected. Suppose they are connected 0 <-> 1 <-> 2 <-> 3 then from 0 your reset order is 3, 2, 1, 0; from 1: 3, 2, 0, 1; from 3: 0, 1, 2, 3. From 2: 0, 1, 3, 2.
jerryXCORE
Experienced Member
Posts: 65
Joined: Tue Apr 30, 2013 10:41 pm

Post by jerryXCORE »

Let me explain my situation clearer:
* I am writing code to online upgrade itself by:
--- Adding a web-server task to tile[0] for uploading file, and initiating soft-reset commands.
--- (surely there are other tasks running on tile[0], [1],[2],[3])
* Yes, the connection of tiles is: 0<>1<>2<>3
* Before calling soft-reset, my program uploads new codes to FLASH.
--- which means: I want every tile to reset from FALSH

* So when I issues soft-rest to tile[3],does tile[3] read new code from FLASH automatically?
hkr87
Member
Posts: 12
Joined: Tue Oct 13, 2015 11:36 am

Post by hkr87 »

When a tile resets, it will reset from whatever it boot-pins are set to; on your setup that is probably FLASH for tile 0, and XLINK for tiles 1, 2, and 3. You need to reset the four of them quickly enough in order so that the boot from LINK does not come up while links are still active. Normally this is guaranteed because you have a single reset line that takes them all out of reset simultaneously.

The reset order depends on where you are in the network. Every node you reset will disable the switch on that node, removing any path through it. Hence, you reset the furthest node first, and work inwards. There shouldn't be any other traffic on the network, because that may be terminated in mid-flight, and that may block the path of the next reset message.

A sure-fire method is to connect one of the ports to the reset line, if it is open collector. Drive a 0 on it when you want to apply the reset. Declare it as a port (without in or out) to stop it from driving a zero on boot. It may have a pull-down on it, but not strong enough to pull against any external pull-up.
jerryXCORE
Experienced Member
Posts: 65
Joined: Tue Apr 30, 2013 10:41 pm

Post by jerryXCORE »

Thanks for the detailed explanation. But the solution is actually quite opposite: you want to be "slow enough", instead of "quick enough". For example:

Code: Select all

     write_sswitch_reg_no_ack(get_tile_id(tile[3]), 6, pllVal);   delay_ticks(delay); 
     write_sswitch_reg_no_ack(get_tile_id(tile[2]), 6, pllVal);   delay_ticks(delay);
     write_sswitch_reg_no_ack(get_tile_id(tile[1]), 6, pllVal);   delay_ticks(delay);
     write_sswitch_reg_no_ack(get_tile_id(tile[0]), 6, pllVal);
For my own board, my experiment shows that: any delay will work. I use delay_ticks(0), delay_ticks(100000), and all works pretty well.

If you connect two sliceKits, my experiment shows that: delay should >=80.
Last edited by jerryXCORE on Thu Oct 22, 2015 3:25 pm, edited 1 time in total.
User avatar
infiniteimprobability
Verified
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am

Post by infiniteimprobability »

This is an interesting discussion..

We had some experience of something similar where we observed a unreliable reset in a multi-chip system (only worked about 75% of the time so not a good customer experience).

Firstly, the absolutely safe way of doing it is indeed using the RST line. This is nice and clean and will be the most robust but uses extra HW, albeit an I/O and some simple components.

Doing it all using the PLL register write way needs careful tuning, and needs to be done in a kind of reverse spanning tree way. As hkr87 said - do it too quickly and you will cut off packets in flight to the furthest away nodes so they won’t be reset. Too slow, and links can come up connected to an out of synch or to floating lines, possibly causing link lockup. So this is a definitely a case for Goldilocks engineering of getting it just right.

We found that a delay of about a microsecond was enough time to let packets get there, but ensure that the end node wasn’t too far ahead after reboot. This is related to the PLL settle time.

However to complicate things, depending on the network and usage, there may not be a switch path available immediately, which could throw timings due to blocking. You should ideally ensure in the application that a path is always available, however unwinding all of the channel communications to ensure this may be quite fiddly if the paths are heavily used.

To work around this, we wrote a split write_sswitch_no_ack function into an open and write/close pair primatives. The open blocks until the path is available and reserves it thereafter. Once this is done, the write/close can be used, safe in the knowledge that the message will get through quickly and that the delay timings will be observed.

Attached is the code to do this in a 1 x L16 and 1 x L8 system.

Code: Select all

#define _chkct(a,b) {__asm__ __volatile__ ("chkct res[%0], %1": : "r" (a) , "r" (b));}
#define _outct(a,b) {__asm__ __volatile__ ("outct res[%0], %1": : "r" (a) , "r" (b));}
#define _outuchar(a,b) {__asm__ __volatile__ ("outt res[%0], %1": : "r" (a) , "r" (b));}
#define _outuint(a,b) {__asm__ __volatile__ ("out res[%0], %1": : "r" (a) , "r" (b));}
#define _inuint_byref(a,b) {__asm__ __volatile__ ("in %0, res[%1]": "=r" (b) : "r" (a));}
#define _sync(a) {__asm__ __volatile__ ("syncr res[%0]": : "r" (a));}

static inline unsigned _getchanend(unsigned otherside) {
  unsigned cend;
  __asm__ __volatile__ ("getr %0, 2"
                       : "=r" (cend));
  __asm__ __volatile__ ("setd res[%0], %1"
                       : : "r" (cend), "r" (otherside) );
  return cend;
}


static inline void _freechanend(unsigned cend) {
__asm__ __volatile__ ("freer res[%0]"
                       : /* no output */
                       : "r" (cend));
}

void wait_us(int us)
{
    int time_now;
    timer t;
    t :> time_now;
    t when timerafter(time_now + us * 100) :> void;
}



{unsigned, unsigned} _write_sswitch_no_ack_open(unsigned tile_id){
    unsigned c_write, dst_addr;
    dst_addr = XS1_RES_TYPE_CONFIG | (XS1_CT_SSCTRL << XS1_CHAN_ID_CHANNUM_SHIFT) | (tile_id << XS1_CHAN_ID_PROCESSOR_SHIFT);
    c_write = _getchanend(dst_addr);


    //Start of packet with command token to request write to switch. Reserves route
    _outct(c_write, XS1_CT_WRITEC);

    return {c_write, dst_addr};
}

void _write_sswitch_no_ack_write_close(unsigned c_write, unsigned dst_addr, unsigned reg, unsigned data){

    unsigned rtn_addr;

    //Calculate return address, dest node tile_id, channel end 0xff (will get junked by device so stops ack getting through)    
    rtn_addr = dst_addr >> 8;
    rtn_addr |= 0xff;

    //Send return address (3 bytes)
    _outuchar(c_write, rtn_addr >> 16);
    _outuchar(c_write, rtn_addr >> 8);
    _outuchar(c_write, rtn_addr);

    //Send bottom 8b of reg
    _outuchar(c_write, reg >> 8);

    //Send bottom 8b of reg
    _outuchar(c_write, reg)

    //Send data
    _outuint(c_write, data);

    //Send end of packet
    _outct(c_write, XS1_CT_END);

    //Free channel end
    _freechanend(c_write);
}

////////////////////////////////////////////////////
///RESET FUNCTION//////////////////////////////////
///MUST BE CALLED ON TILE 0///////////////////////
/////////////////////////////////////////////////

            unsigned c_write, dst_addr, pll_val, id[3];

            //Free channel end here if needed

            id[0] = get_tile_id(tile[0]);
            id[1] = get_tile_id(tile[1]);
            id[2] = get_tile_id(tile[2]);

            read_sswitch_reg(id[2], 6, pll_val);                     //read the pll register
            {c_write, dst_addr} = _write_sswitch_no_ack_open(id[2]); //Open channels to tile2 sswitch first
        
            wait_us(10000); 
    
            _write_sswitch_no_ack_write_close(c_write, dst_addr, 6, pll_val); //write reset val to tile 2
            wait_us(1);   

            read_sswitch_reg(id[1], 6, pll_val);
            write_sswitch_reg(id[1], 6, pll_val);                             //Reset tile 1

            wait_us(1); 

            read_sswitch_reg(id[0], 6, pll_val);
            write_sswitch_reg(id[0], 6, pll_val);                             //Reset tile 0
Redeye
XCore Addict
Posts: 131
Joined: Wed Aug 03, 2011 9:13 am

Post by Redeye »

This discussion is really helpful and has prompted me to revisit software rebooting of my 7 tile device which I've never managed to get working. So, I have 7 tiles like made up of an L8 and 3xL16 like this :

L8<->L16<->L16<->L16

The tiles are tile[0] to tile[6] from left to right. The system boots from flash on tile[1]. I've always assumed that I'd need to initiate a software reset from tile[1] so that it's the last switch to reset. Is this correct?

Using the very useful hints in this thread I've modified my reboot function to the following :

Code: Select all

     unsigned c_write, dst_addr, pll_val, id[7];

     id[0] = get_tile_id(tile[0]);
     id[1] = get_tile_id(tile[1]);
     id[2] = get_tile_id(tile[2]);
     id[3] = get_tile_id(tile[3]);
     id[4] = get_tile_id(tile[4]);
     id[5] = get_tile_id(tile[5]);
     id[6] = get_tile_id(tile[6]);

     read_sswitch_reg(id[0], 6, pll_val);
     {c_write, dst_addr} = _write_sswitch_no_ack_open(id[0]);
     wait_us(10000);
     _write_sswitch_no_ack_write_close(c_write, dst_addr, 6, pll_val);
     wait_us(5);

     read_sswitch_reg(id[6], 6, pll_val);
     {c_write, dst_addr} = _write_sswitch_no_ack_open(id[6]);
     wait_us(10000);
     _write_sswitch_no_ack_write_close(c_write, dst_addr, 6, pll_val);
     wait_us(5);

     read_sswitch_reg(id[5], 6, pll_val);
     {c_write, dst_addr} = _write_sswitch_no_ack_open(id[5]);
     wait_us(10000);
     _write_sswitch_no_ack_write_close(c_write, dst_addr, 6, pll_val);
     wait_us(5);

     read_sswitch_reg(id[4], 6, pll_val);
     {c_write, dst_addr} = _write_sswitch_no_ack_open(id[4]);
     wait_us(10000);
     _write_sswitch_no_ack_write_close(c_write, dst_addr, 6, pll_val);
     wait_us(5);

     read_sswitch_reg(id[3], 6, pll_val);
     {c_write, dst_addr} = _write_sswitch_no_ack_open(id[3]);
     wait_us(10000);
     _write_sswitch_no_ack_write_close(c_write, dst_addr, 6, pll_val);
     wait_us(5);

     read_sswitch_reg(id[2], 6, pll_val);
     {c_write, dst_addr} = _write_sswitch_no_ack_open(id[2]);
     wait_us(10000);
     _write_sswitch_no_ack_write_close(c_write, dst_addr, 6, pll_val);
     wait_us(5);

     read_sswitch_reg(id[1], 6, pll_val);
     write_sswitch_reg(id[1], 6, pll_val);   
But this still isn't working. It looks like it's successful past resetting tile[6] (the ethernet phy is on that tile and that's getting reset) but something's not working after that point.

Can anyone see anything that'll obviously not work in this code, or any ideas to debug where it's getting stuck?
jerryXCORE
Experienced Member
Posts: 65
Joined: Tue Apr 30, 2013 10:41 pm

Post by jerryXCORE »

Not sure why you use such complicated codes, maybe try the following simple codes, similar to mine.
Since your flash is located in tile[1], run following codes from tile[1]:

Code: Select all

 unsigned int pllVal, tile_id, delay; //Run these codes from tile[1]:
    tile_id= get_local_tile_id();
    read_sswitch_reg(tile_id, 6, pllVal);
    delay = 100;
    write_sswitch_reg_no_ack(get_tile_id(tile[6]), 6, pllVal); delay_ticks(delay);
    write_sswitch_reg_no_ack(get_tile_id(tile[5]), 6, pllVal); delay_ticks(delay);
    write_sswitch_reg_no_ack(get_tile_id(tile[4]), 6, pllVal); delay_ticks(delay);   
    write_sswitch_reg_no_ack(get_tile_id(tile[3]), 6, pllVal); delay_ticks(delay);
    write_sswitch_reg_no_ack(get_tile_id(tile[2]), 6, pllVal); delay_ticks(delay);
    write_sswitch_reg_no_ack(get_tile_id(tile[0]), 6, pllVal); delay_ticks(delay);
    write_sswitch_reg_no_ack(get_tile_id(tile[1]), 6, pllVal);
Please try different value of "delay", and let us the result.
Redeye
XCore Addict
Posts: 131
Joined: Wed Aug 03, 2011 9:13 am

Post by Redeye »

Thanks jerryXCORE - that was the code I was using before (without delays) which didn't work. I've tried putting your delays in but that doesn't make it work either.

Further experiments reveal that my reboot code above works sometimes which suggests that maybe something is blocking part of the reboot sequence, but if the timing is right it runs properly. The problem is going to be finding out what's blocking it. Obviously there's quite a lot of comms going on between tiles and maybe something is hogging a link (probably waiting for something on a tile which has already been reset) which is stopping one or more of the tiles getting reset.
jerryXCORE
Experienced Member
Posts: 65
Joined: Tue Apr 30, 2013 10:41 pm

Post by jerryXCORE »

I see. And don't know what sample codes are running inside? Maybe you can try simple codes without inter-tile communications for testing purpose.

For example, write codes to flash LEDs only for every tile, then see what happens.