Strange debugger behavior anyone?

Technical questions regarding the XTC tools and programming with XMOS.
TjBordelon
Active Member
Posts: 39
Joined: Mon Jul 29, 2013 4:41 pm

Strange debugger behavior anyone?

Post by TjBordelon »

I've definitely seen issues in the debugger when the target XCORE isn't happy. For example, when debugging late one night I saw a register set to 0x80200 being saved to the stack as 0x80100! That turned out to be a flaky clock oscillator.

So my recent woes seem fleeting in that I don't know if it's my hardware or the debugger!

It would be very nice if there were some kind of testing options in the tools to validate communication with the target. Maybe it could just read/write memory and verify all commands work. Maybe it could upload a test program that runs and spits out an "all ok".

In any case, the issue I had was random "illegal opcodes" appearing in my code. I thought I had it narrowed down but just as soon as I did, it went away. Everything started working. The issue would not happen in the simulator, however.

So imagine my delight of trying to track this down, and running into a host of other issues:

- Code seemlngly out-of-sync even with a rebuild/clean.. stepping on comments!
- Memory window not updating
- Assembly window putting the XC statements in the wrong place.

It's hard to know what is a flaky debugger, ME, or an unhappy target!

Anyone have experiences they care to share in this area?
User avatar
JasonWhiteman
Active Member
Posts: 63
Joined: Mon Jul 15, 2013 11:39 pm

Post by JasonWhiteman »

I have started my first debug session - so unfortunately, I do not have expert advice on the debugger as a seasoned debug road warrior would have. However, sometimes it's good to have feedback from other privates in the trenches.

From my experience thus far, I would have to agree with your characterization of the environment as "flaky". For me, from my 1st DOA slicekit, to disconnects in documentation, to tutorial code that does not function as expected, to a crashing xgdb session.

Although as a novice, my instinct would be to blame the short between the keyboard and GND - the experience has not been one of confidence building as I try to climb through the ranks to hopefully an expert level. "Hope" is employed as business (i.e. management) may trump any aspirations to spend too much time trying to determine where to point fingers. I'm also hopeful that a vibrant/active support community will help in the navigation department.

Design philosophy aside - if you wish to see more details regarding a debug session that does not quite match what you've described with exception of the "flaky" characterization, then take a look at my thread covering the issue: http://www.xcore.com/forum/viewtopic.php?f=26&t=2274

Regards,
Jason Whiteman
User avatar
XMatt
XCore Addict
Posts: 147
Joined: Tue Feb 23, 2010 6:55 pm

Post by XMatt »

TjBordelon wrote: It would be very nice if there were some kind of testing options in the tools to validate communication with the target. Maybe it could just read/write memory and verify all commands work. Maybe it could upload a test program that runs and spits out an "all ok".
The tools by default verify that any application downloaded to the device via the XTAG appears in memory as is expected. It reports an error during download if this is not the case. This was put in to check for issues with JTAG connectivity and problems with the xCORE not being powered or clocked correctly. If the tools do not error on program download then there should be no issues with programs being incorrect in memory.

If you want to test that memory can be accessed correctly then xgdb provides a fairly simple scripting language that would allow you to write a memory test if you felt that is required.
TjBordelon
Active Member
Posts: 39
Joined: Mon Jul 29, 2013 4:41 pm

Post by TjBordelon »

XMatt wrote:The tools by default verify that any application downloaded to the device via the XTAG appears in memory as is expected. It reports an error during download if this is not the case.
This is good to know. I think I am closer to isolating my specific issue, which seems to be hard to pin down due to the fact that even after cleaning/rebuilding I am running stale code on occasion.

I have an open ticket where a very simple code block isn't hitting a breakpoint, nor is the core running the code. It fails to work in the simulator and the debugger so I finally caught a good test case:

Code: Select all

#include <platform.h>
#include <xs1.h>
#include <xclib.h>

on tile[2] : out port RADIO_DATA = XS1_PORT_8C;

int main()
{


	par {


		on tile[2]:
		{
//			for(int i=0; i<10;i++)
//				RADIO_DATA <: i;

			while(1)
			{
				RADIO_DATA <: 1;
				RADIO_DATA <: 0;
			}

		}


	}

	return 0;
}
I can literally sprinkle breakpoints everywhere and none are ever hit.

I can uncomment out the for loop and sometimes see a BP hit there, sometimes not. On one occasion it stepped on the comments! Clean/rebuild all doesn't seem to fix me.

This is obviously a cut down example I had to isolate, but my original issue was that my project was acting very flaky with a host of issues that were quite confusing.

I had memory corruption, traps, invalid opcodes, and even after commenting out lots of stuff it seemed to continue to run into functions that were commented. More frustrating was the fact that the memory window wasn't updating all the time and the assembly window had the source lines jumping around..

When this many issues happen, obviously it is of no help to start complaining about all of them so I'm starting to try to isolate them.
User avatar
JasonWhiteman
Active Member
Posts: 63
Joined: Mon Jul 15, 2013 11:39 pm

Post by JasonWhiteman »

Your observation about the debugger pointing to the "wrong places" does resonate with me as when I used the debugger, asked to "run to code" and it ended up in a different spot than where I told it to run.

Therefore, my experience is not completely different.

Regards,
Jason Whiteman
TjBordelon
Active Member
Posts: 39
Joined: Mon Jul 29, 2013 4:41 pm

Post by TjBordelon »

Jason-- saw your issue in that post and I think we do see similar issues.

XMOS support confirms an issue that is to be fixed in the next release where breakpoints don't always hit when dropped on code in a multicore main().

I have also isolated a test case and sent this to them where I built a project that called a few assembly instructions. Then I removed those calls with a simple tight loop in main() and rebuilt with a CLEAN, and wouldn't you know... none of my breakpoints in main() get hit. I hit pause and OLD CODE IS RUNNING!

I thought I was losing my mind so I'm very happy I was able to catch the tools red handed. I am still willing to give the tools the benefit of the doubt. With so many PC configurations to support, maybe I am the oddball with the bogus version of java (another reason to hate java :)
TjBordelon
Active Member
Posts: 39
Joined: Mon Jul 29, 2013 4:41 pm

Post by TjBordelon »

OK- The exact issue I'm hitting is a syncronization issue between what shows up in the disassembly window and what is actually on the chip.

Is anyone out there having this issue? The first time I hit this I had a bug in my code causing a trap. The disassembly was all over the place, showing old code that was commented out, illegal opcodes, and the wrong instructions. When I get in this state, I can't figure out how to get out of it other than thrashing around and it usually goes away in a few hours of simply giving up on using the debugger and going back to printfs.

The disassembly getting totally out of wack, breakpoints don't get hit, and debugging becomes impossible.

Do you guys simply not see this, or is there a workaround? I'm dead in the water with no way to dev. At version 12 or 13 I want to say this is a "me" problem since I surely can't be the first wave of folks to use these tools. Everyone would be complaining by now.
User avatar
TSC
Experienced Member
Posts: 111
Joined: Sun Mar 06, 2011 11:39 pm

Post by TSC »

I've noticed some strange debugger behaviour too. Most recently it just points to the wrong statements when an exception has occurred. Version 13 beta of xTIMEcomposer by the way.

I'm not very experienced with using the debugger, so I've always just figured that it was my own fault. But maybe there's an "Emperor has no clothes" thing happening, and people just aren't talking about their dodgy experiences with the debugger.

Right now I'm using printf statements and disabling various bits of code with comments in order to debug. It's pretty slow and tedious, so I really wish I could trust the debugger.
TjBordelon
Active Member
Posts: 39
Joined: Mon Jul 29, 2013 4:41 pm

Post by TjBordelon »

TSC wrote:I really wish I could trust the debugger.
Yes, indeed it's hard when you can't trust your "test equipment". When you have a bug and you don't know if it's your tools or your code all bets are off.

Well, I finally think I may have found what is going on! In all fairness, XMOS clued me in by telling me in a trouble ticket that they know breakpoints don't work in a multicore main.

Imagine this frustrating scenario: You are calling an assembly function in main(). All is working. Then without warning you get a trap "Illegal Resource". The IP is on the wrong block of code, with illegal opcodes all around. Are you losing your mind? I sure thought I was.

Well the fix I got from XCORE for my "breakpoints not hit in multicore main" bug was to simply call out from main into a function in another file. Then BPs are hit.

I think in the problem is much more extensive than they've let on though. I think not only are breakpoints not being hit, but the tools show completely incorrect source and disassembly in the dissassembly window!! I am totally blown away by this. Every other tool I've used will look at memory and disassemble. What about code that gets corrupted by your program? What about self modifying code? (The purists balk! But I want performance). Well, it appears the disassembly window doesn't show you what is really there but what it thinks is there, which is wrong quite frequently in my experience.

The problem even happens in v13 so it isn't fixed!

In my above example, the root cause was ME. I wasn't restoring a register from the stack. Upon returning, an invalid resource was accessed in main() which is the bug I think they know about. But I was unable to figure this out until suffering for 2 weeks because the tools don't show you what is really in program memory. I could have had this fixed in 5 minutes if the disassembly window was working.

But I trusted it. I assumed memory was getting corrupt. I paid some cash and had my BGA reworked. I spent a few days scoping my power and clock and powerup timings. I pulled my hair out because it MUST BE ME! Surely such basic functionality couldn't be broken in version 13 of anything. Or someone would be working a weekend and a patch would be out. I'd know about it.

I love XCORE. It's freaking fast. Amazing processor. Great idea. It just seems like they do things differently where they shouldn't. I think the disassembly window is a big one. Here is my wish list for the disassembly window:

- Show what's REALLY THERE in memory.
- Disassemble in real time
- Let me single step in that window!!!!


I don't understand why nobody else has really chimed in save you two guys. This seems like such a big deal that everyone would be hootin' and hollerin'. Are we the only users? :)
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

TjBordelon wrote:breakpoints don't work in a multicore main.
That actually makes sense, because a par{} main isn't like normal code at all:
it is more like a declaration than code really. The generated machine code
for it is nothing like machine code generated for a normal function; it is more
like the runtime startup code in your usual toolchain, but generated intead of
static.

Of course breakpoints on it should work, if you can place them there.
I think in the problem is much more extensive than they've let on though. I think not only are breakpoints not being hit, but the tools show completely incorrect source and disassembly in the dissassembly window!! I am totally blown away by this. Every other tool I've used will look at memory and disassemble. What about code that gets corrupted by your program? What about self modifying code? (The purists balk! But I want performance).
I have *never* seen a debugger that works "correctly" with self-modifying
code. FWIW, self-modifying code is almost never faster than well-written
non-self-modifying code, on XS1 (and on other RISC architectures).
In my above example, the root cause was ME. I wasn't restoring a register from the stack.
You're breaking the ABI, and expect the debugger to notice? Heh.
Surely such basic functionality couldn't be broken in version 13 of anything.
The version number stands for the year and month it was released.
I don't understand why nobody else has really chimed in save you two guys. This seems like such a big deal that everyone would be hootin' and hollerin'. Are we the only users? :)
Most people don't have these problems. Most people know how to work
around such problems just fine. Most people do not reply to rants.