Hello everyone,
I'm currently stuck with a bug that occurs intermittently in our XMOS firmware. I'm using xgdb and encountering the following exception, but I'm having trouble identifying the specific task causing it.
```
Thread 1.1 hit Catchpoint -1 (XCore Exception ET_LOAD_STORE), 0x0008470a in prvSelectHighestPriorityTask (xCoreID=0)
at .../modules/rtos/modules/FreeRTOS/FreeRTOS-SMP-Kernel/tasks.c:894
894 if( pxTCB->xTaskRunState == taskTASK_NOT_RUNNING )
(gdb) bt
#0 0x000800c4 in _DoException ()
#1 0x0008470a in prvSelectHighestPriorityTask (xCoreID=0) at .../modules/rtos/modules/FreeRTOS/FreeRTOS-SMP-Kernel/tasks.c:894
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
```
Does anyone have suggestions on how to retrieve more information about the cause of this exception, or how to identify which task is triggering it?
Unfortunately, since debugging is only possible via JTAG on this device, I assume rtos_printf won't be usable in this context, right?
Thanks a lot!
FreeRTOS task scheduling exception debugging
-
- Junior Member
- Posts: 5
- Joined: Sat Mar 01, 2025 5:49 pm
-
- Junior Member
- Posts: 5
- Joined: Sat Mar 01, 2025 5:49 pm
a quick update, it seems that the whole internal RAM got reset to zero, can this be caused by the debugging itself or a consequence of the exception? This makes debugging even harder...
Any suggestions on how to proceed ?
Thanks!
Any suggestions on how to proceed ?
Thanks!
-
Verified
- Member
- Posts: 9
- Joined: Tue Jun 28, 2022 10:58 am
Hi,
It seems that the exception is occurring while the RTOS is attempting to choose the next task to run. An 'info registers' command will give a bit more information about what exception occurred. Specifically the 'ed' register will show the address that was involved in the load_store exception.
Although from what I can see in your logs above, it could be that pxTCP is NULL, and that is causing the exception (if 'ed' is zero, this would be confirm the NULL dereference). It could be that part of the program is erroneously setting a range of memory to zero and causing this exception. Is the exception always hit in prvSelectHighestPriorityTask? Could it be that a task is being removed incorrectly, leaving a dangling reference to it within the scheduler?
The debugger should not erase all internal RAM, nor should the default exception handler, so this suggests some other problem. Could be a faulty part of the software (e.g. a memset with bad address), the watchdog resetting the system, an external reset or hardware issue, for example. If it's the software, then it should be possible to catch it using a watchpoint on some memory you aren't expecting to change.
As for logging: if you don't have `xscope` available, you're right that `rtos_printf` (or standard `printf`) will halt the core while printing, which breaks real-time operation. One workaround is to log to an in-memory circular buffer — then, when the program crashes or hits a breakpoint, you can inspect the buffer from the debugger (assuming memory hasn't been wiped at that point).
Hope this helps,
Ciaran
It seems that the exception is occurring while the RTOS is attempting to choose the next task to run. An 'info registers' command will give a bit more information about what exception occurred. Specifically the 'ed' register will show the address that was involved in the load_store exception.
Although from what I can see in your logs above, it could be that pxTCP is NULL, and that is causing the exception (if 'ed' is zero, this would be confirm the NULL dereference). It could be that part of the program is erroneously setting a range of memory to zero and causing this exception. Is the exception always hit in prvSelectHighestPriorityTask? Could it be that a task is being removed incorrectly, leaving a dangling reference to it within the scheduler?
The debugger should not erase all internal RAM, nor should the default exception handler, so this suggests some other problem. Could be a faulty part of the software (e.g. a memset with bad address), the watchdog resetting the system, an external reset or hardware issue, for example. If it's the software, then it should be possible to catch it using a watchpoint on some memory you aren't expecting to change.
As for logging: if you don't have `xscope` available, you're right that `rtos_printf` (or standard `printf`) will halt the core while printing, which breaks real-time operation. One workaround is to log to an in-memory circular buffer — then, when the program crashes or hits a breakpoint, you can inspect the buffer from the debugger (assuming memory hasn't been wiped at that point).
Hope this helps,
Ciaran
-
- Junior Member
- Posts: 5
- Joined: Sat Mar 01, 2025 5:49 pm
Hey Ciaran,
thanks a lot for your reply and the helpful debugging tips!
It looks like I'm dealing with two separate issues:
1. The first is related to corruption in the xSuspendedTaskList. E.g. it contains an item whose pxContainer points to a pxReadyTasksList instead of xSuspendedTaskList. I assume something goes wrong while moving the item between lists — possibly during a task state transition.
2. The second issue is making debugging extremely difficult. After an unpredictable amount of time, the debugger starts returning the same fixed value for every memory address. Previously this value was 0, which led me to believe the memory was being wiped, but now it's a different constant. Interestingly, reading registers still works fine. Are those copied once to xgdb or read via JTAG each time I call info register? I am wondering if the JTAG connection could be the issue here.
Have you ever seen something like this?
Thanks again,
Mischa
thanks a lot for your reply and the helpful debugging tips!
It looks like I'm dealing with two separate issues:
1. The first is related to corruption in the xSuspendedTaskList. E.g. it contains an item whose pxContainer points to a pxReadyTasksList instead of xSuspendedTaskList. I assume something goes wrong while moving the item between lists — possibly during a task state transition.
2. The second issue is making debugging extremely difficult. After an unpredictable amount of time, the debugger starts returning the same fixed value for every memory address. Previously this value was 0, which led me to believe the memory was being wiped, but now it's a different constant. Interestingly, reading registers still works fine. Are those copied once to xgdb or read via JTAG each time I call info register? I am wondering if the JTAG connection could be the issue here.
Have you ever seen something like this?
Thanks again,
Mischa
-
Verified
- Member
- Posts: 9
- Joined: Tue Jun 28, 2022 10:58 am
Hi Mischa,
1. I'm not very familiar with freeRTOS, but hopefully with the debugger issue resolved this can be tracked down.
2. Regarding registers - yes, they get cached inside GDB. You can clear the cache by issuing the 'maintenance flush register-cache' command in gdb, which will force them to be read from the target again
Regarding the memory issue - this is symptomatic of the system getting reset behind the debugger's back, so xgdb is trying to communicate with the target when the target is in the wrong state. The usual cause of this is the use of the Watchdog timer resetting the system - because the watchdog isn't stopped when the debugger halts the program. This will result in the same data being returned over & over. So this would be my expectation - a good step would be to disable the watchdog, and ensure that the system is stable.
It's possible that the JTAG connection is defective, but generally this is pretty stable - so either it works or it doesn't, so it wouldn't be my first guess.
If this doesn't lead anywhere, you can send me the gdb log and I can take a look. This can contain detailed information about your program, so you may want to private message it to me (upload it to pastebin or some other external service). To create the log, modify your connect/attach command to add the log-level and log-file options:
And the log will be created in 'my_log.txt'.
Thanks,
Ciaran
1. I'm not very familiar with freeRTOS, but hopefully with the debugger issue resolved this can be tracked down.
2. Regarding registers - yes, they get cached inside GDB. You can clear the cache by issuing the 'maintenance flush register-cache' command in gdb, which will force them to be read from the target again
Regarding the memory issue - this is symptomatic of the system getting reset behind the debugger's back, so xgdb is trying to communicate with the target when the target is in the wrong state. The usual cause of this is the use of the Watchdog timer resetting the system - because the watchdog isn't stopped when the debugger halts the program. This will result in the same data being returned over & over. So this would be my expectation - a good step would be to disable the watchdog, and ensure that the system is stable.
It's possible that the JTAG connection is defective, but generally this is pretty stable - so either it works or it doesn't, so it wouldn't be my first guess.
If this doesn't lead anywhere, you can send me the gdb log and I can take a look. This can contain detailed information about your program, so you may want to private message it to me (upload it to pastebin or some other external service). To create the log, modify your connect/attach command to add the log-level and log-file options:
Code: Select all
connect --log-level=trace,xdbg::usb=warn --log-file=my_log.txt
Thanks,
Ciaran