ET_ILLEGAL_PC with USBTMC example on xCORE-200 USB slicekit

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
aneves
Experienced Member
Posts: 93
Joined: Wed Sep 16, 2015 2:38 pm

ET_ILLEGAL_PC with USBTMC example on xCORE-200 USB slicekit

Post by aneves »

I have a newly acquired xCORE-200 USB sliceKIT that I am using to try out some of the examples included with xTIMEcomposer 14. The one of most importance to me is the "USB Test and Measurement Device [AN00135]" example. I can run the example as is with the LabVIEW example provided and with LabVIEW code of my own. I've also been able to customize small parts of the example to implement different commands and responses which is really cool.

One thing I noticed almost instantly is that if I modify line 32 in scpi_cmds.c from the following:

Code: Select all

SCPI_ResultInt(context, DUMMY_MEAS_RESULT_VAL);
to this:

Code: Select all

SCPI_ResultDouble(context, 33.3);
I run into some problems. This code change allows me to return a double (the constant 33.3) instead of an integer (DUMMY_MEAS_RESULT_VAL is a #define equal to 10). When I run the LabVIEW code to send *MEASure:VOLTage:DC? I get 33.3 as a result which is what I expect. However, if I run it again the debugger notifies me that the program crashed with:

Code: Select all

tile[0] core[1]  (Suspended: Signal 'ET_ILLEGAL_PC' received. Description: Illegal program counter.)	
	2 <symbol is not available> 0x0000001c	
	1 _TrapHandler()  0x00040108	
I know that this exception means that the program counter register was written to with a bogus and inaccessible memory address. I was able to narrow down the culprit to an eventual call to snprintf which takes the double (33.3) parameter and convert it to a string for output. If I comment out snprintf or hard-code a string in place, I no longer crash. Debugging this has proved to be difficult since I can't find what call lead to this.

Can someone verify this problem with this example?

How can I go about debugging this further?

Thanks!!
User avatar
infiniteimprobability
Verified
XCore Legend
Posts: 1164
Joined: Thu May 27, 2010 10:08 am

Post by infiniteimprobability »

HI, I can't say I have tried that example but here's a tip how to debug.

You are on the right track with the ET_ILLEGAL_PC exception. This is most likely from memory trashing from something unsafe. C programs do not have array or pointer bounds checking so is likely the cause of the corruption, and you already strongly suspect snprintf .

I would use xgdb to try to find out what has happened..

xgdb <my_file.xe>
conn
run
info registers

Look at the SPC - this should have had the old address before the exception. Use disassemble to find out what preceded it. My guess is that something went out of bounds and wrote to the code section. This will pretty quickly cause an exception so you shouldn't have to go back far.

WHen you find the offending instruction that has been corrupted, you can set a hardware watch point on it

watch *<address>

Then run again and you should get a breakpoint when that is written to, which will be the cause of the corruption..

One other idea is to rename the .c files .xc and you will get bounds checking turned on. This may be a quite a big exercise though depending on what is getting referenced.
User avatar
aneves
Experienced Member
Posts: 93
Joined: Wed Sep 16, 2015 2:38 pm

Post by aneves »

I tried your suggestion using xgdb in the command line. When I get the program to crash I get the following output:

Code: Select all

Program received signal ET_ILLEGAL_PC, Illegal program counter.
[Switching to tile[0] core[1]]
0x0000001c in ?? ()
(gdb) info registers
r0             0x0      0
r1             0x7fa9c  522908
r2             0x0      0
r3             0x0      0
r4             0x48c68  298088
r5             0x7fa94  522900
r6             0x4b8f0  309488
r7             0xffffffff       -1
r8             0x2      2
r9             0x0      0
r10            0x7fa95  522901
r11            0x6      6
cp             0x4a3f8  304120
dp             0x4acc8  306376
sp             0x7fa58  522840
lr             0x40108  262408   _InitChildThread + 0
pc             0x1c     28
sr             0x10     16
spc            0x1c     28
ssr            0x0      0
et             0x2      2
ed             0x0      0
sed            0x0      0
kep            0x40100  262400
ksp            0x43f78  278392
As you can see, spc has the same bogus value of 0x1c so nothing I can do there. I noticed that the link register (lr) points to "_InitChildThread + 0" but when I do a backtrace I get:

Code: Select all

(gdb) bt
#0  0x0000001c in ?? ()
#1  0x00040108 in _TrapHandler ()
which tells me the return function is "_TrapHandler" instead. I suspect that _TrapHandler was called inplace of _InitChildThread once it was realized that the pc was garbage.

Reading through the XS1 Instruction Set pdf, I found that the saved program counter (spc) is used to save the program counter during an interrupt. The value of the status register (sr) is 0x10 but I don't know how to interpret that to help me here or if that is even the right path. If the spc is also wrong, and that is only set when handling an interrupt, then I suppose that means somewhere while handling an interrupt a bad value got written to spc. Then when the interrupt is returning control sp is written to pc and that is where the exception is raised.

Does this sound like I'm on the right track? I'm not sure how to proceed debugging here. How can I figure out if.which interrupt might be causing trouble?

Thanks!!
User avatar
larry
Respected Member
Posts: 275
Joined: Fri Mar 12, 2010 6:03 pm

Post by larry »

Are you able to isolate the snprintf call in a small test program? Anything that can run in a simulation so we can see a complete instruction trace leading up to the trap.

If from instruction trace you can see the compiled code is clearly doing something wrong, it would be good to send a bug report on XMOS website (Support / Help / Report a Bug).
User avatar
aneves
Experienced Member
Posts: 93
Joined: Wed Sep 16, 2015 2:38 pm

Post by aneves »

Hi Larry,

I will submit a bug with XMOS in this case. I'll update this thread with any info they pass along.