Logical cores as IEC61508 "differing systematic capability"?

aclassifier · Post by **aclassifier** » Fri Dec 23, 2016 8:36 am

With reference to a previous post [1] I learned that I can stop one logical core and while others may run, but that they all froze after some time because they depended on comms with each other. IEC 61508 for safety critical systems will in some cases care about parts of a system that have, and others that do not have the same "systematic capability". I have discussed this in some blog notes, like [2].

I imagine this must have been discussed before here, but I wonder if it possible to build SW for one logical core that might survey the others and will not hang when the others have stopped by some exception handling? This monitor could time out when it doesn't see heartbeats from the others - but will that loss of communication be "safe" (meaning that positive confirmation of "connection down" is to be trusted)? Or, even possible:

I more like it if a watchdog logical core could participate more actively with the non-safety other logical core. Will an attempted interface call or channel comm trying to communicate with another and stopped logical core failing to do so be detected by a watchdog core? How tight is the communication?

Both interface call start calls and channel sending are basically blocking(?) I haven't tried to do this in a select with a timout, so I don't know if it's possible.

If it's not possible to have logical cores so loosely coupled, will it make sense to connect some pins and communicate that way, or is there some scheduling fine print that would make such a scheme not viable?

A follow-up question is whether I from one logical core am allowed to stop another logical core? This would be great as seen from the monitor, but more problematic if an application core could stop the monitor core.

[1] https://www.xcore.com/forum/viewtopic.p ... it=#p26034

[2] http://www.teigfam.net/oyvind/home/tech ... ements8221

plex · Post by **plex** » Fri Dec 23, 2016 9:13 am

Just an idea of a possible implementation of a watchdog core assuming that checking an interface in a select will not cause this core to hang if the transmitting core has stopped. The default case allows the select to not be blocking if no data has been received over the interface
All watched cores will periodically send a "heartbeat" over the interface to the "watchdog" core.
I am sure there is a more elegant way to do it but if the assumption stands then it should hopefully work.

Code: Select all

void watch_dog(void)
	{
	int coretimeouts[2] = {0,0};
	
	while(1)
		{
		select
			{
			case stream[int i].getbeat0(unsigned long value):
					//here you can also check value that could indicate other conditions that the watchdog will need to act on
					coretimeouts[0] = 0;
					break;
			default:
				if(coretimeouts[0] < 100)
					{
					coretimeouts[0] ++;
					}
				else
					{
					//reset device or do other action
					}
				break;					
			}
		
		select
			{
			case stream[int i].getbeat1(unsigned long value):
					//here you can also check value that could indicate other conditions that the watchdog will need to act on
					coretimeouts[1] = 0;
					break;
			default:
				if(coretimeouts[1] < 100)
					{
					coretimeouts[1] ++;
					}
				else
					{
					//reset device or do other action
					}
				break;					
			}
		
		delay_ms(10);	
		}
	}

aclassifier · Post by **aclassifier** » Fri Dec 23, 2016 9:13 am

plex wrote:Just an idea of a possible implementation of a watchdog core assuming that checking an interface in a select will not cause this core to hang if the transmitting core has stopped.

Exactly, will it or will it not? Or timeout on a channel select? Etc..

henk · Post by **henk** » Fri Dec 23, 2016 11:35 am

Hi,

You can do that under a few provisos.

The way to implement it would be to set up a timer with an interrupt in that 'monitor thread' that takes care of the monitoring. This interrupt would get through to the thread even if it was hanging in some communication, and in the interrupt you can cause a reset. A reset is the only thing you can reasonably do here - recovering the state with all resources in half completed states will be difficult.

The provisos: these threads share memory and resources. So - this will not guard you against a rogue thread walking over your memory, or a rogue thread overruling your timer (I am using the thread here rather than the term logical core to make it obvious that these are not physically separated cores, and can hence be affected by other threads in the same system).

To make it foolproof you could dedicate a physical core in the system, i.e. a tile, to keep an eye on things. It could have a GPIO connected to RST_N, and pull that when things go bad. That tile cannot be affected by other tiles, as it has a physically separate memory.

Cheers,
Henk

aclassifier · Post by **aclassifier** » Fri Dec 23, 2016 11:35 am

henk wrote:Hi,

You can do that under a few provisos.

The way to implement it would be to set up a timer with an interrupt in that 'monitor thread' that takes care of the monitoring. This interrupt would get through to the thread even if it was hanging in some communication, and in the interrupt you can cause a reset. A reset is the only thing you can reasonably do here - recovering the state with all resources in half completed states will be difficult.

OK

The provisos: these threads share memory and resources. So - this will not guard you against a rogue thread walking over your memory, or a rogue thread overruling your timer (I am using the thread here rather than the term logical core to make it obvious that these are not physically separated cores, and can hence be affected by other threads in the same system).

What's a 'rogue thread'. Like this: https://en.wikipedia.org/wiki/Rogue_security_software? Are you implying than any well compiled XC cannot be 'evil'? Are the usage checks that comprensive? I believe that without a pragma, occam parallel usage checks were about that good(?) I am about to study XC port (like) p32 usage checking; the examples I see will set or clear individual bits with only a week "keep off" by programming of the other bits.

To make it foolproof you could dedicate a physical core in the system, i.e. a tile, to keep an eye on things. It could have a GPIO connected to RST_N, and pull that when things go bad. That tile cannot be affected by other tiles, as it has a physically separate memory.

I believe that IEC 61508, depending on the SIL level and the assessor would not require a separate tile. Which of these processors (or even, boards) would give me an extra tile to work with http://www.teigfam.net/oyvind/home/tech ... processors. I have two xCORE-XA board boards, but the ARM is called a "core"(?) I will update with this info in the above should I get any replies.

Cheers,
Henk

aclassifier · Post by **aclassifier** » Fri Dec 23, 2016 11:35 am

aclassifier wrote:
henk wrote:Hi,

You can do that under a few provisos.

The way to implement it would be to set up a timer with an interrupt in that 'monitor thread' that takes care of the monitoring. This interrupt would get through to the thread even if it was hanging in some communication, and in the interrupt you can cause a reset. A reset is the only thing you can reasonably do here - recovering the state with all resources in half completed states will be difficult.
OK

The provisos: these threads share memory and resources. So - this will not guard you against a rogue thread walking over your memory, or a rogue thread overruling your timer (I am using the thread here rather than the term logical core to make it obvious that these are not physically separated cores, and can hence be affected by other threads in the same system).
What's a 'rogue thread'. Like this: https://en.wikipedia.org/wiki/Rogue_security_software? Are you implying than any well compiled XC cannot be 'evil'? Are the usage checks that comprensive? I believe that without a pragma, occam parallel usage checks were about that good(?) I am about to study XC port (like) p32 usage checking; the examples I see will set or clear individual bits with only a week "keep off" by programming of the other bits. (Update: I see that xmap is rather stringent about port usage in not allowing a duplicate usage of f.ex. XS1_PORT_32A that has 32 bits. So I am forced to handle all bits rather locally. Nice!)

To make it foolproof you could dedicate a physical core in the system, i.e. a tile, to keep an eye on things. It could have a GPIO connected to RST_N, and pull that when things go bad. That tile cannot be affected by other tiles, as it has a physically separate memory.
I believe that IEC 61508, depending on the SIL level and the assessor would not require a separate tile. Which of these processors (or even, boards) would give me an extra tile to work with http://www.teigfam.net/oyvind/home/tech ... processors. I have two xCORE-XA board boards, but the ARM is called a "core"(?) I will update with this info in the above should I get any replies.

Cheers,
Henk

larry · Post by **larry** » Fri Dec 23, 2016 5:49 pm

A select with on a channel or interface would work as a watchdog, with a default case to time out on

Alternatively, if you are short of channel ends, just poll a shared memory area using an unsafe pointer

aclassifier · Post by **aclassifier** » Fri Dec 23, 2016 5:49 pm

larry wrote:A select with on a channel or interface would work as a watchdog, with a default case to time out on

Alternatively, if you are short of channel ends, just poll a shared memory area using an unsafe pointer

But then my application would be cluttered with unsafe pointers (they would need to unsafe by all users, right?)

Logical cores as IEC61508 "differing systematic capability"?

Logical cores as IEC61508 "differing systematic capability"?

Re: Logical cores as IEC61508 "differing systematic capabili

Re: Logical cores as IEC61508 "differing systematic capabili

Re: Logical cores as IEC61508 "differing systematic capabili

Re: Logical cores as IEC61508 "differing systematic capabili

Re: Logical cores as IEC61508 "differing systematic capabili

Re: Logical cores as IEC61508 "differing systematic capabili

Re: Logical cores as IEC61508 "differing systematic capabili