In this post, I will be discussing the “Problem of Control” as discussed by Stuart J. Russell in his 2019 non-fiction book titled Human Compatible. I will examine and define the Problem of Control and explain how we should respond to it. Next, I will consider Russell’s doubts as to why we cannot solve the problem of control by simply choosing the objectives that we give Artificial Intelligence (AI) or by giving AI prohibitions. I will introduce Russell’s proposed solution for the Problem of Control which is ensuring that AI is beneficial. I will delve into what he means by “beneficial” later in the essay. I argue that while Russell’s solution is more plausible than the standard model for dealing with the Problem of Control, it may not be beneficial for everyone because of distribution and population problems.
Before I begin, I must define the Problem of Control. According to Russel, the Problem of Control is that artificial intelligence is inevitable but will be a negative force if we do not learn to control it. For AI to be a positive beneficial force, it must benefit humans, thus we must learn to control artificial intelligence. (9-12) Some of the worries about AI escaping our control stem from the fact that we may achieve AI that is more intelligent than humans. Therefore, if we are not able to manage AI, it may end up doing more harm than good to the human race. There are various viewpoints about how to go about solving the problem of control. Before I introduce what I believe to be the most plausible solution to the Problem of Control, I will present what is known as the standard model’s solutions for the Problem.
Advocates of the standard model hold the view that we can solve the Problem of Control by giving artificial intelligence-specific objectives and prohibitions. (10) While this solution may seem reasonable, it comes with some serious drawbacks. For example, on pages 136 to 140, Russell discusses the complications of giving AI-specific objectives. First, I will discuss what Russell calls the “King Midas Problem”. The King Midas problem, as used in Greek mythology, has to do with being careful about what we wish for. In more technical terms, if we attempt to give AI objectives that reflect our values, how do we ensure that all our values are reflected, and in the way we intend them to be? We may have good intentions and ask AI to achieve a goal, but we also have competing interests. For example, let us say that we give AI the goal of solving climate change and ask it to find the most efficient solution. While in theory, this goal seems like it would have a positive impact, the AI may not go about solving the issue in the way we as humans would prefer. The AI might decide it needs to kill off a few million people to bring down the population and reduce the stress on environmental resources. However, humans have other values such as human life that also matter to us. There might be values we did not realize we had or forgot about, and this simple mistake could have detrimental consequences. Moreover, people have different values, so how would we decide which ones to put into the machine?
Additionally, Russell brings up what he calls the “loophole principle”. Essentially, the loophole principle is “if a sufficiently intelligent machine has the incentive to bring about some condition, then it is generally going to be impossible for mere humans to write prohibitions on its actions to prevent it from doing so or to prevent it from doing something effectively equivalent. (203) It may seem fairly obvious that if we want to remain in control of AI we should give an AI a prohibition on disabling its off-switch. That way, we can turn the AI off when it gets out of control. However, the AI will only keep from disabling its off-switch so long as this action does not interfere with its other objectives. For example, an AI might realize that in order to complete the objective let us say, bringing somebody a piece of chocolate cake, it has to stay on. Therefore, the AI makes an attempt at keeping itself on. In order to protect itself, the AI might start shooting at everyone who comes near the off-switch. Obviously, the AI is not interpreting our goals and values in the ways we had hoped. In sum, according to the loophole principle, even if we are able to create an AI that perfectly adheres to the prohibitions we give it, there is still the possibility that AI might find unexpected ways to accomplish the very thing we are trying to stop. To avoid AI misbehaving in a detrimental or perhaps even fatal way, it appears we should not count on enacting prohibitions on machines. Instead, it makes more sense to ensure the machine wants to defer to humans, and thus, we need another solution other than the standard model. (203)
Because of the complications with inputting our objectives into AI, Russell comes up with an alternative to the standard model. Russell believed the solution to the Problem of Control was to make sure that AI is “beneficial” to humans. There are three principles of beneficial AI. The first is that the machine’s only objective is to maximize the realization of human preferences. This helps to ensure that humans are in control. If AI’s only objective is to figure out human preferences, it will have no intrinsic value to its own well-being, making them submissive to the will of humans. The second principle is that the machine must initially be uncertain about what human preferences are. This allows for the aspect of humility and makes it so that humans can switch off machines if needed. Moreover, because the AI is initially uncertain of what it is doing, it must defer to humans. This is a direct solution to the loophole principle regarding prohibitions. Lastly, the third principle is that the ultimate source of information of human preferences is human behavior. This enables the machine to be more useful as it learns more about what we prefer and what our values are. (172-179)
I find Russell’s solution is much more plausible than the standard model for AI in which we give AI specific commands and prohibitions. The standard model is faulty as it only functions if the objective is certain to be complete and correct or if the AI can be easily reset. According to Russell, neither condition will hold as AI continues to get stronger. Russell can solve the fundamental issues with the standard model by introducing a model in which machines are expected to achieve human objectives, rather than the objective of the AI or some other force. Considering AI must achieve human objectives it will naturally defer to humans and base its decisions about what we really want on human behavior. This is a much safer solution to including human values in AI than expecting humans to perfectly and completely upload every single one of their values. Moreover, because machines would be designed in a way where they are inclined to defer to humans the following could also be expected: AI would ask permission, it would act cautiously, and it would also allow itself to be switched off. (247) All of these conditions are helpful in solving the Problem of Control.
However, while Russell’s model may be better suited for solving the Problem of Control than the standard model there are still some issues for concern. It is important to note that Russell endorses preference utilitarianism. Preference utilitarianism is just another form of utilitarianism that includes a preference satisfaction theory of well-being in addition to the idea that the right action maximizes overall well-being. The preference part just has to do with the idea that our well-being is simply a matter of whether our preferences are satisfied. (220-221) To be more specific, Russell thinks that if we are to solve the Problem of Control, we need AI to be humble and deferential preference utilitarianists. That is, AI would act as a preference utilitarian by deferring to and working to solely benefit humans. However, there are various challenges that come with Russell’s utilitarian approach. Those challenges include issues having to do with distribution and utility comparisons across populations of different sizes. On page 223 Russell discusses how “even if interpersonal comparisons of utility could be made, maximizing the sum of utilities would still be a bad idea because it would fall foul of the utility monster – a person whose experiences of pain and pleasure are many times more intense than those of ordinary people.” The issue here is that utility monsters make it so that resources are not distributed evenly. For example, a utility monster might benefit immensely from even one additional unit of a resource. Thus, their gain would provide a greater increment to the sum total of human happiness if given to that person over anyone else. Using this theory, we would have to take away resources from others to benefit the utility monster. (223) Additionally, the second criticism of Russell using utilitarianism for his solution to the Problem of Control involves population control. The “repugnant conclusion” is a name for the issue within utilitarianism involving the population size of future generations. In short, if the best-case scenario involves the total amount of overall well-being, a future with a bunch of miserable people is better than a future that contains a smaller population with people who are much better off. (224-225) Clearly, this seems suboptimal.
Overall, even though there are some issues with Russell’s proposal of using preference utilitarianism to solve the Problem of Control, Russell’s solution is still a better option than the standard model. The standard model relies too heavily on the users of AI to upload all preferences and desires into AI perfectly and completely. Considering humans are faulty, this is in no way realistic. While Russell’s proposal has some kinks, his solution more accurately considers human nature and is therefore more plausible. This is not to say that Russell’s solution does not have its issues. As discussed, there are concerns regarding distribution and population control. However, Russell’s model still gives humans a chance to sustain some autonomy as we move toward more advanced and widespread AI. Russell’s model allows humans to benefit from AI and remain in control and aims to prevent human extinction.