Can a program fix itself (Variables)? (safety critical enviroment)-CodePudding

I just got started into writing fail-safe, high integrity C code and I'd like to know if programs can "fix themselves" if a variable gets corrupted for whatever reason (for example cosmic rays). I know that there's specific hardware like ECC ram that can counter this but assuming that the hardware that I will be using doesn't have error correction, are there any ways a program can check itself for errors and fix itself? I know I could log every variable change somewhere and check every variable before usage if it has been changed somehow but that'd slow down a program by a large margin due to I/O speeds. Are there any other ways for a program to check and possibly fix itself?

CodePudding user response：

If you're running on Arduino or something like that, I'd suggest having two of the same hardware running the same program. Then you can check that they produce the same result. Maybe even periodically compare the whole memory to see that it is identical.

Of course, this could also be done with virtual machines if your hardware is enough for that.

If it's critical that the program keeps running, use three machines and use the result that is produced from at least two machines. That's what they did on Saturn V.

CodePudding user response：

You will need to perform some mathematical operation on specific parts of your memory, where your critical variables are e.g.: CRC, Hashing (@klutt already mentioned that). You could also create some wrapper around your variable and store them redundant (twice or more) and check for changes when reading them. These will not guarantee systematic errors (for example bit position 7 being defect on the bus), but are probably a very easy implementation. There are a lot of different approaches in communication for checking for errors or changes like checksums, which can be implemented fairly easy.

A different possibility is redundant hardware (as @klutt also mentioned). These are standard in todays safety applications, depending on SIL (do people die or not).

Software checking for its own errors is not intuitive, every c compiler you have will work against you. Making all redundant variables volatile will also have the unwanted effect of your program being much slower.

Lots of hard to implement possibilities in this answer, maybe someone knows a go to software solution? I dont think there is one though...

CodePudding user response：

”I know I could log every variable change somewhere and check every variable before usage..."

you "know"? How would that work? How would you know that the software doing the logging and checking has not been affected? It is not at all practical.

Critical data (especially persistent/non-vilatile data) might employ redundancy, error-detection/error-correction, but for "whole-system" integrity you would do better to monitor for correct operation. Often spontaneous data corruption, either through external interference or software error will result in incorrect operation. By using software and hardware watchdogs, you can detect many of these faults and take corrective action - often you will be able to do little more than issue a reset.

Software watchdogs are applicable only in multi-threaded/multi-tasking systems, and in some cases you might be able to restart a thread - but that requires a rather sophisticated software architecture to pull off that trick and trust the integrity of the system