Problem with reboot during DFU

bearcat · Post by **bearcat** » Mon May 23, 2011 4:50 am

Working with the DFU code for a 2 core L1 system. In the code where it performs a software reboot, the code don't appear to reboot the cores. Using the standard XMOS applications (reboot.c and write_sswitch_reg_blind.s) that write to the switch for each core:

Code: Select all

/* Reboots XMOS device by writing to the PLL config register */
void device_reboot(void) 
{
    unsigned int pllVal;
    unsigned int core_id = get_core_id();
    read_sswitch_reg(core_id, 6, &pllVal);
    write_sswitch_reg_blind(core_id^0x8000, 6, pllVal);
    write_sswitch_reg_blind(core_id, 6, pllVal);
}

Tried: XDE Run, XDE Debug, Load to flash. All the same. During an XDE debug, after it received the reboot command, a few seconds later I hit the pause, and showed each core at the instruction that should reboot the cores. The screen dump is attached.

Single 2 bit XLINK between cores.

Any ideas? If there is an active channel between the cores, does this block the write to reboot?

Woody · Post by **Woody** » Mon May 23, 2011 10:56 am

Any write to the PLL register in the sswitch will reset the whole of the node that the sswitch is on. (Each node has a single sswitch). If you are using L devices then there is an sswitch (and a PLL register) for each XCore.

I am assuming that the write_sswitch_reg_blind() writes to the sswitch and wait for a response. The code
looks right to me: an active channel (via an xmos link) is required to send the reset command to the remote XCore. Once that command is sent, there is no need to mainatain an open channel.

I expect that the IDE may lose track of what the system is doing at that point since it has reset. So what you see in the gui may not actually be what is happening on the XCore.

m_y · Post by **m_y** » Mon May 23, 2011 11:51 am

Resetting an N-L1 system is tricky.

I'm guessing from the name of the function write_sswitch_reg_blind() that you're not waiting for responses to the switch register writes. Is this correct? If so, it may be that the second (local) reset is being received and acted-on by the local switch before message to the remote switch has been completely transmitted. In this case the second L1 will never be reset because it will never receive the PLL register write command. Worse, its switch may become unresponsive if it receives only a partial message.

It's no good wait for a response either. One never gets responses to PLL programming messages because the switch is reset and thus can't send a response message.

The trick is to reset the remote L1, wait until you know the message has gone out then reset locally. How do you know when this is? Good question! Depends on the link speed, mostly.

segher · Post by **segher** » Mon May 23, 2011 3:44 pm

m_y wrote:Resetting an N-L1 system is tricky.

Yeah :-)

Two other things that could be wrong: the code assumes to be run on node id 0,
and the other core has to have node id 0x8000.

It's no good wait for a response either. One never gets responses to PLL programming messages because the switch is reset and thus can't send a response message.

The trick is to reset the remote L1, wait until you know the message has gone out then reset locally. How do you know when this is? Good question! Depends on the link speed, mostly.

For a two-L1 system:

You can check when the message has left the local processor switch; what is left is
checking when the message has left the local system switch. If you know exactly
how the system switch does buffering, you could send some more data via the
local system switch to ensure the PLL write is sent.

Or you could just wait 1ms or so :-P

The situation is much more complex when you have more than two nodes. If you
do all PLL writes from a single core, you have to make sure the network (including
routing) is still connected after every node reset. Another option is to do the reset
via a higher level protocol, or to wire up stuff so that one core can pull all cores'
RESET lines.

bearcat · Post by **bearcat** » Mon May 23, 2011 4:07 pm

Somehow another post I had got lost??

I had been writting the reboot command on each core seperately. I send a command from core 0 to 1 to reboot, wait a few mS, then reboot each core with the above code on each core.

So, in fact I do need to close the active channel to be able to write to the switch. Otherwise each is blocked at the out instruction shown above.

After I closed the channel, now only 1 core actually reboots. Only the core that by chance first executes the reboot does, the other core gets blocked in the read_switch_reg routine.

Hadn't thought about sending the reboot command remotely. Any examples already coded for this?

In my case, I have the ability to drive the reset lines via software. I will code that up as being an easier solution.

bearcat · Post by **bearcat** » Thu May 26, 2011 7:02 am

Thanks for everyone's help.

External reset worked fine. DFU is now working just fine.

Problem with reboot during DFU

Problem with reboot during DFU

Re: Problem with reboot during DFU

Re: Problem with reboot during DFU

Re: Problem with reboot during DFU

Re: Problem with reboot during DFU

Re: Problem with reboot during DFU