Safe AI Features

These features help to keep AI safe and under control.

Critical features

In automated factories the following safety features are common, and in some cases mandated: an Emergency Stop button; and a Watchdog.

» Emergency Stop button: a human operator can press this button at anytime, which always results in stopping the entire system, immediately.

» Watchdog: the computer has to periodically perform an operation that indicates [to some extent] that the computer is still operating normally.

Both of those features could be adopted in future AI regulations.

Obviously the physical stop button requires an operator to have physical access to it. So we might see this in data centres for cloud based systems; and locally on edge based AI systems. Similarly, (ultra-) smartphones and computers could have these buttons. Care still needs to be taken though. An emergency stop could have negative impacts in some environments and scenarios. Systems should be designed to minimise such outcomes.

Watchdogs could exist as independent apps, and independent AI systems. Watchdogs only have an approximate idea of what is happening inside the (AI) system - they are not foolproof.

We should not design AI systems that lead to the AI wanting to fool us. This means no sentient AI systems: no systems that have a sense of self; no systems with emotions. AI systems will no doubt face abuse and attacks from some people. It is therefore vital that AI does not get offended, does not get fearful, and does not get upset.

Bias and accuracy

The above addresses obvious critical aspects associated with dangerous and intentional negative actions by a future AI, but less obvious aspects have to be addressed too (e.g. bias and accuracy).

Independent ("trusted") third party AI systems could be used to check the output and operation of an AI system in question. These checkers could report the accuracy of outputs, and also perform a watchdog function if the accuracy (consistently) falls below an acceptable level.

Users might also use an AI agent that calls upon multiple independent AI systems to get a consensus: showing the user conclusions / predictions that are widely agreed upon (i.e. potentially accurate), along with those that are not.

Finally, because we always underestimate what could go wrong [see bad and ugly], include an hardware lifetime switch: an AI system is gracefully switched off after a defined duration of operation.

All these systems can be directly or indirectly bypassed by a sentient super (general) intelligence - so do not create it!