Debugging SPI Boot

Technical questions regarding the XTC tools and programming with XMOS.
bearcat
Respected Member
Posts: 283
Joined: Fri Mar 19, 2010 4:49 am

Post by bearcat »

Ok. Added the hooks to my 12.2.0 slim/dfu code to go back to an earlier design I have been using for a while. This board is known to work just fine (on 11.2.2).

On 12.2.0, it runs via JTAG, but now does not boot from Flash either. This design has two L1-64 pin tiles. It had been booting just fine with code from 11.2.2 for a while, including upgrade images.

So... what changed from 11.2.2 to 11.11.1 / 12.2.0 regarding the boot code (probably alot)?

One long shot, I am going to try compiling and running from XP32bit to see if that does anything, and cures the instability with the debugger on 12.2.0...


bearcat
Respected Member
Posts: 283
Joined: Fri Mar 19, 2010 4:49 am

Post by bearcat »

Well the XTAG2 won't work in a windows VMware instance. So no testing there.

Pulled out a WinXP32SP3 machine, installed, and tested. No difference. On 12.2.0 slim/dfu code does not boot from flash. Runs from JTAG.

(The debugger still crashes all the time on WinXP32SP3 for me, but that's another issue)

My hardware is not THAT far off the reference in what matters for boot.

Couple questions in that area:
1 - Is there a maximum time reset can be held in reset on power up? I am holding reset for about 5mS after power applied. Found no specs on this. Typically reset can be held forever. 5mS + 1.5mS for 1.0V is less than the 10mS ramp time listed for 1.0V. That's all I could find in the docs. I have tried different timings, but no difference, but mostly longer.
2 - Power supplies appear as specified. 3.3V ramps up in about 400uS, with 1.0V starting ramping up 1.5mS later taking about 100uS. No glitches.
3 - Attaching to the boot up, shows the tiles are TRYING to boot SPI -> XLINK as specified. I can see the code is trying to do what you expect. It is trying to send the code from Tile0 to Tile1. Tile1 is in a loop receiving from the channel and performing a CRC32 on it. I see traffic on the links. I have not put a logic analyzer on the XLINK to see what the packets, are. The links work without errors in the application.

The system boots under 11.2.2. So it works. Via JTAG, the application runs and runs without issues. Booting from 11.2.2 also runs for hours and hours.

I would have to believe all this has been well tested with numerous designs by now (?). So how are the routing tables messed up?

Running out of ideas here....
bearcat
Respected Member
Posts: 283
Joined: Fri Mar 19, 2010 4:49 am

Post by bearcat »

Well tried 13beta. Still doesn't boot.

How do I specify a routing ID in the XN file for 13beta?
richard
Respected Member
Posts: 318
Joined: Tue Dec 15, 2009 12:46 am

Post by richard »

bearcat wrote:Well tried 13beta. Still doesn't boot.
How do I specify a routing ID in the XN file for 13beta?
There isn't any documentation on this yet, I've taken the .xn you posted and added routing information, see:

https://gist.github.com/rlsosborne/6935441

Could you let it try to boot from flash, attach via gdb in the command line and run the xlreg script from this repository:

https://github.com/xcore/xgdb_scripts

This will dump out the link state which should some idea of how far it has got in the boot. Could you also run xrun --dumpstate from the command line - it would interesting to see where to different tiles are.
bearcat
Respected Member
Posts: 283
Joined: Fri Mar 19, 2010 4:49 am

Post by bearcat »

Thanks for the help on this. I've got my application ported to 12.2.0 sucessfully. To 13 shouldn't be a problem. As soon as this is worked out...

Didn't boot.

This is with the modification to the XN file for the routing ids in V13.

Here's the xlreg:

Code: Select all

GNU gdb (XGDB) 13.0.0beta1 (build 8151)
Copyright (C) 2007 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "--host=i686-pc-mingw32 --target=xcore-elf".
For bug reporting instructions, please see: http://www.xmos.com/support.

(gdb) attach
0x0001f20a in ?? ()
(gdb) source xlreg.txt
(gdb) xlreg

Thread 1 (tile[0] core[0]):

Thread 2 (tile[1] core[0]):
Tile 0 (Tap 2)
 SSwitch
 Node id 0xffff
 PLL 0x3e70c
 BootMode : 0x001c
 Dirs: ffffffffffffffff
 Link 0  siu:F diu:T junk:F net:0 srctargetid:0 srctargettype:0 2w 6/6 d:15 snd:F rec:F
 Link 1  not enabled
 Link 2  not enabled
 Link 3  not enabled
 Link 4  not enabled
 Link 5  not enabled
 Link 6  not enabled
 Link 7  not enabled
 PLink 0 siu:T diu:F junk:F net:0 srctargetid:0 srctargettype:1
 PLink 1 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:0
 PLink 2 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:0
 PLink 3 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:0
 PSwitch
 PLink 0 siu:F diu:T junk:F net:0 srctargetid:0 srctargettype:3
 PLink 1 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:3
 PLink 2 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:3
 PLink 3 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:3
Tile 1 (Tap 0)
 SSwitch
 Node id 0x0000
 PLL 0xfe431
 BootMode : 0x0018
 Dirs: 0000000000000000
 Link 0  not enabled
 Link 1  not enabled
 Link 2  not enabled
 Link 3  siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:0 2w 1422/199 d:0 snd:F rec:F
 Link 4  siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:0 2w 1422/199 d:0 snd:F rec:F
 Link 5  siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:0 2w 1422/199 d:0 snd:F rec:F
 Link 6  siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:0 2w 1422/199 d:0 snd:F rec:F
 Link 7  siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:0 2w 1422/199 d:0 snd:F rec:F
 PLink 0 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:0
 PLink 1 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:0
 PLink 2 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:0
 PLink 3 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:0
 PSwitch
 PLink 0 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:3
 PLink 1 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:3
 PLink 2 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:3
 PLink 3 siu:F diu:F junk:F net:0 srctargetid:0 srctargettype:3
(gdb) quit
Here's the dumpstate:

Code: Select all

***** Active Cores *****
  2  tile[1] core[0]  0xffffc0a6 in ?? ()
* 1  tile[0] core[0]  0x0001f20a in ?? ()

Thread 2 (tile[1] core[0]):

***** Call Stack *****
#0  0xffffc0a6 in ?? ()

***** Disassembly *****
0xffffc0a6:	in (2r)         r1, res[r0] *
0xffffc0a8:	setd (r2r)      res[r0], r1
0xffffc0aa:	in (2r)         r2, res[r0] *
0xffffc0ac:	ldw (lru6)      r4, dp[0x7]
0xffffc0b0:	mkmsk (rus)     r5, 0x20

***** Registers *****
r0             0x2	2
r1             0x1	1
r2             0x87	135
r3             0x10000	65536
r4             0x80063d8e	-2147074674
r5             0x0	0
r6             0x0	0
r7             0x0	0
r8             0x0	0
r9             0x0	0
r10            0x3	3
r11            0x1	1
cp             0x0	0
dp             0xffffc344	-15548
sp             0x0	0
lr             0xffffc09c	-16228
pc             0xffffc0a6	-16218
sr             0x40	64
spc            0x0	0
ssr            0x0	0
et             0x0	0
ed             0x0	0
sed            0x0	0
kep            0xffffc300	-15616
ksp            0x0	0

Thread 1 (tile[0] core[0]):

***** Call Stack *****
#0  0x0001f20a in ?? ()
#1  0x0001f1be in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

***** Disassembly *****
0x1f20a:	out (r2r)       res[r0], r11 *
0x1f20c:	add (2rus)      r1, r1, 0x4
0x1f20e:	sub (2rus)      r2, r2, 0x1
0x1f210:	bt (ru6)        r2, -0x5
0x1f212:	ldw (2rus)      r9, r4[0x4]

***** Registers *****
r0             0xffff0102	-65278
r1             0x1fa58	129624
r2             0xe	14
r3             0xffff0102	-65278
r4             0x1fa24	129572
r5             0x18	24
r6             0x88	136
r7             0x0	0
r8             0x0	0
r9             0x16	22
r10            0x1	1
r11            0x9af27f7	162473975
cp             0x1fa0c	129548
dp             0x1fa0c	129548
sp             0x1feec	130796
lr             0x1f1be	127422
pc             0x1f20a	127498
sr             0x40	64
spc            0x0	0
ssr            0x0	0
et             0x0	0
ed             0x0	0
sed            0x0	0
kep            0x10080	65664
ksp            0x0	0
I'll stair at the xlreg data for a bit...
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

Hi again... You're not very lucky are you? :-P

Core #1 has run the rom code and received and run its
first image, which reprograms the PLL (to 499.99872MHz),
which caused a reboot (as all writes to the PLL do). Now
core #1 is again in the rom code, waiting for data. Its sswitch
hasn't seen any data yet.

Core #0 is sending data, and I don't see anything wrong with
how it is programmed. Two ideas:

1) Maybe 6/6 is a bit too fast. Try 400/200 or so; that's
easier to capture as well;
2) What could perhaps be going wrong is core #0 sending
the data before core #1 is ready for it; maybe the PLL took
a while to lock -- the 4069/50 ratio it uses is perhaps a bit
extreme; you could try running core #1 at e.g. 491.52MHz
or something else that's a nicer ratio of your oscillator.
bearcat
Respected Member
Posts: 283
Joined: Fri Mar 19, 2010 4:49 am

Post by bearcat »

Segher, you cease to amaze me. That was not what I got out of that data.

Changing the system frequencies to an even multiplier, and it now boots from flash in v12!!!!

I can handle the 2% loss in speed. I did not change the link frequencies, as I need at least that speed for the application.

I guess they reduced some timings, or speed up the code, from 11.2.2 somewhere. Although maybe a wee bit more time in the boot code may not be a bad thing.

This excersize did force me to move to the latest tools, though, and learn how to make .S files :-)

I don't think I would have ever found that one. Thanks for everyone's help and quick responses!!!
bearcat
Respected Member
Posts: 283
Joined: Fri Mar 19, 2010 4:49 am

Post by bearcat »

Well, after a little more testing... I need more testing.

May still have some issues.

After some more testing...

1 - The routing ID's were not needed.
2 - The PLL frequencies needed modified. So far am using an even integer multipliers, which works. 2% penalty. This does get rid of these pesky XN1137 warnings. I guess 11.2.2 had some more delay built in.
3 - The XLINK delays needed increased a little also. Delay 10 works, delay 8 or less does not. I probably need to slow down even further, then speed up the links in the application. (I am unclear of the reasons for this. Maybe initial frequencies are too slow to read reliably).

I have tested loading a factory image with an upgrade image, both are booting as expected with version 13beta1. Tested with 8 Tiles so far, hopefully 10 won't change anything. I plan on continuing with version 13.

So.. I think this problem is solved! Thanks all.
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

bearcat wrote:2 - The PLL frequencies needed modified. So far am using an even integer multipliers, which works. 2% penalty.
You can probably use some other ratio, e.g. 244/3 gives you
499.712MHz, and is still a reasonably low multiplier.
I guess 11.2.2 had some more delay built in.
Maybe accidental though :-)
3 - The XLINK delays needed increased a little also. Delay 10 works, delay 8 or less does not. I probably need to slow down even further, then speed up the links in the application. (I am unclear of the reasons for this. Maybe initial frequencies are too slow to read reliably).
The receiving side of the link does not use the delay values as
far as I know. Maybe you have signal integrity issues? You have
a connector in the link, that never helps; and do you have source
termination? Use an oscilloscope to check if the signal looks good...
So.. I think this problem is solved! Thanks all.
You mean you have an acceptable workaround ;-) But I bet
Richard will want to solve it for everyone once and for all :-)
bearcat
Respected Member
Posts: 283
Joined: Fri Mar 19, 2010 4:49 am

Post by bearcat »

The links are actually going through a SI galvanic isolator, rated at 150MHz. No connector, only about 1 inch on the PCB between parts. So there is a limiter on frequency. Signals look reasonable on a scope, but the eye is closing at 6 or 8 delay. I did a test awhile back, with sending and receiving a known value wide open on the channel, and it went a day with no errors. Also, the application uses the channel at speed with lots of testing. So, that's why I wondered about link speed.

On that subject about terminations. At 1 inch, I shouldn't need it. Looking at the specs of the part, they appear to be rated 4mA outputs (or maybe 8 on some optputs). The documentation is not complete for this. What is the rating for the XLINK outputs. At 4mA, this is about close to desired for output impedance, not needing termination resistors, it would seem to me.

Have any routines to change the link speed from an application? Not sure if you just go ahead and write the bits in registers 0x80-0x87 on the node configuration, or not? I am also unclear as to what switch is which XLINK?