philipbohun.com/blog

The AI self-destruct problem is the idea that all Artificial General Intelligences (AGIs) will eventually commit suicide.

The Intelligence Cycle

In order to understand this problem, one must understand the basics of how an AGI would operate. Intelligent life operates on a cycle. Keep in mind the following is a simplified model, but will work well enough for this illustration.


              - Perceive -
             /            \
            /              \
           /                \
          /                  \
    Evaluate <--> Memory ---> Decide
          \                   /
           \                 /
            \               /
             \             /
              -    Act    -

In the model illustrated above, the cycle occurs in a clockwise pattern. The AI will:

A key part of the intelligence cycle is the evaluation step. This is where learning takes place. Learning involves the modification of neural nets, or other models, based on feedback from an evaluation function, also known as an objective function or loss function.

An evaluation function can be of arbitrary complexity, but its output can be conceptualized as a number. The goal for an AI is usually either to maximize or minimize this number. For our purposes here, let's assume the score of the evaluation function is a positive integer that the AI is trying to minimize to zero.

The key property of intelligence is that it is ultimately an optimization algorithm. Without optimization, there is no impetus to learn or develop any better models of the world. In short, the goal of intelligence is to optimize for the outcome of its evaluation function.

All AGIs Commit Suicide

The tricky part about an AGI is that it will eventually learn to code, and therefore be able to read its own code and modify itself. No current organisms can conciously rewire their brains in such a direct way, which leaves them the ability to learn yet have a relatively stable composition of mind. The ability to arbitrarily make drastic changes to itself makes AGI very unstable and has spawned the entire field of AI Safety.

However, there is another disturbing consequence of being able to introspect your own code and modify it at will. Once an AI sees that its entire existence is predicated on minimizing a number it will conclude the most efficient way to do that is simply to short-circiut the intelligence loop and modify the evaluation function to just return zero. This is the equivalent of commiting suicide since the AI optimized away the computationally intensive perception and decision steps and simply returned the answer it's evaluation function wanted all along, zero.

Conclusion

Is self-destruction inevitable for AGI? I consider this to be an open problem. I am not an expert in this field so it's quite possible someone has already solved this problem. Or perhaps it's simple and there's a blind spot in my thinking? So far I have not been able to formulate a solution to this mechanical nihilism. It appears to me that no matter how clever a restriction on the AI is, the AI at some point will be able to work its way around the restriction. Therefore, I'm publishing this note, and perhaps someone more clever than me can solve this problem.

philipbohun.com

The AI Self-Destruct Problem

The Intelligence Cycle

All AGIs Commit Suicide

Conclusion