Skip to main content Skip to secondary navigation

SQ5. What are the prospects for more general artificial intelligence?

Main content start

Since the dawn of the field, AI research has had two different, though interconnected, goals: narrow AI, to develop systems that excel on specific tasks, and general AI, to create systems that achieve the flexibility and adaptability of human intelligence. While all of today’s state-of-the-art AI applications are examples of narrow AI, many researchers are pursuing more general AI systems, an effort that some in the field have labeled AGI, for artificial general intelligence.

Most successful AI systems developed in the last several years have relied, at least in part, on supervised learning, in which the system is trained on examples that have been labeled by humans. Supervised learning has proven to be very powerful, especially when the learning systems are deep neural networks and the set of training examples is very large. 

Reinforcement learning is another framework that has produced impressive AI successes in the last decade. In contrast with supervised learning, reinforcement learning relies not on labeled examples but on “reward signals” received by an agent taking actions in an (often simulated) environment. Deep reinforcement learning, which combines deep neural networks with reinforcement learning, has generated considerable excitement in the AI community following its role in creating AlphaGo, the program that was able to beat the world’s best human Go players. (We will return to AlphaGo in a moment.)

A third subfield of AI that has generated substantial recent interest is probabilistic program induction,1 in which learning a concept or skill is accomplished by using a model based on probabilities to generate a computer program that captures the concept or performs the skill. Like supervised learning and reinforcement learning, most probabilistic program induction methods to date have fit squarely into the “narrow AI” category, in that they require significant human engineering of specialized programming languages and produce task-specific programs that can’t easily be generalized.

While these and other machine-learning methods are still far from producing fully general AI systems, in the last few years important progress has been made toward making AI systems more general. In particular, progress is underway on three types of related capabilities. First is the ability for a system to learn in a self-supervised or self-motivated way. Second is the ability for a single AI system to learn in a continual way to solve problems from many different domains without requiring extensive retraining for each. Third is the ability for an AI system to generalize between tasks—that is, to adapt the knowledge and skills the system has acquired for one task to new situations, with little or no additional training. 

Self-Supervised Learning With the Transformer Architecture

Significant progress has been made in the last five years on self-supervised learning, a step towards reducing the problem of reliance on large human-labeled training sets. In self-supervised learning, a learning system’s input can be an incomplete example, and the system’s job is to complete the example correctly. For instance, given the partial sentence “I really enjoyed reading your...,” one might predict that the final word is “book” or “article,” rather than “coffee” or “bicycle.” Systems trained in this way, which output probabilities of possible missing words, are examples of neural network language models. No explicit human-created labels are needed for self-supervised learning because the input data itself plays the role of the training feedback.

Such self-supervised training methods have been particularly successful when used in conjunction with a new architecture for deep neural networks called the transformer.2 At its most basic level, a transformer is a neural network optimized for processing sequences with long-range dependencies (for example, words far apart in a sentence that depend on one another), using the idea of “attention” weights to focus processing on the most relevant parts of the data.

Google AI Example
Widely available tools like Google Docs’ grammar checker uses transformer-based language models to propose alternative word choices in near-real time. While prior generations of tools could highlight non-words (“I gave thier dog a bone”), or even distinguish common word substitutions based on local context (“I gave there dog a bone”), the current generation can make recommendations based on much more distant or subtle cues. Here, the underlined word influences which word is flagged as a problem from 9 words away. Image credit: Michael Littman via https://docs.google.com/.

 

Transformer-based language models have become the go-to approach for natural language processing, and have been used in diverse applications, including machine translation and Google web search. They can also generate convincingly human-like text. 

Transformers trained with self-supervised learning are a promising tool for creating more general AI systems, because they are applicable to or easily integrated with a wide variety of data—text, images, even protein-folding structures3—and, once trained, they can either immediately or with just a small amount of retraining known as “fine-tuning” achieve state-of-the-art narrow AI performance on difficult tasks.

Continual and Multitask Learning

Significant advances have been made over the last several years in AI systems that can learn across multiple tasks while avoiding the pervasive problem of catastrophic interference between the tasks, in which training the system on new tasks causes it to forget how to perform tasks it has already learned.  Much of this progress has come about due to advances in meta-learning methods.  

Meta-learning refers to machine-learning methods aimed at improving the machine-learning process itself. One influential approach is to train a deep neural network on a variety of tasks, where the objective is for the network to learn general-purpose, transferable representations, as opposed to representations tailored specifically to any particular task.4 The learned representations are such that a neural network trained in this way could be fine-tuned for a variety of specific tasks with only a small number of training examples for a given task. Meta-learning has also led to progress in probabilistic program induction, by enabling abstraction strategies that learn general-purpose program modules that can be configured for many different tasks.5

In continual learning, a learning agent is trained on a sequence of tasks, each of which is seen only once. The challenges of continual learning are to constantly use what has been learned in the past to apply to new tasks, and, in learning new tasks, to avoid destroying what has already been learned. While continual learning, like meta-learning, has been researched for decades in the machine-learning community, the last several years have seen some significant advances in this area. Examples of new approaches include training systems that mimic processes in the brain, known as neuromodulatory processes, to learn gating functions that turn on and off network areas to enable continual learning without forgetting.6

Making Deep Reinforcement Learning More General

For decades, the game of Go has been one of AI’s grand challenge problems, one much harder than chess due to Go’s vastly larger space of possible board configurations, legal moves, and strategic depth. In 2016, DeepMind’s program AlphaGo definitively conquered that challenge, defeating Lee Sedol, one of the world’s best human Go players, in four out of five games. AlphaGo learned to play Go via a combination of several AI methods, including supervised deep learning, deep reinforcement learning, and an iterative procedure for exploring possible lines of play called Monte Carlo tree search.7 While AlphaGo was a landmark in AI history, it remains a triumph of narrow AI, since the trained program was only able to perform a single task: playing Go. Later developments in the AlphaGo line of research have drastically reduced the reliance on example games played by humans, Go-specific representations, and even access to the rules of the game in advance. Nevertheless, the learned strategies are thoroughly game-specific. That is, the methodology for producing the Go player was general, but the Go player was not.

In the last several years, much research has gone into making deep-reinforcement-learning methods more general.  A key part of reinforcement learning is the definition of reward signals in the environment. In AlphaGo, the only reward signal was winning or losing the game. However, in real-world domains, a richer set of reward signals may be necessary for reinforcement-learning algorithms to succeed. These reward signals are usually defined by a human programmer and are specific to a particular task domain. The notion of intrinsic motivation for a learning agent refers to reward signals that are intended to be general—that is, useful in any domain. Intrinsic motivation in AI is usually defined in terms of seeking novelty: an agent is rewarded for exploring new areas of the problem space, or for being wrong in a prediction (and thus learning something new). The use of intrinsic motivation has a long history in reinforcement learning,8 but in the last few years it has been used as a strategy for more general reinforcement-learning systems designed to perform multitask learning or continual learning where the same learning system is trained to solve multiple problems.9

Another set of advances in reinforcement learning is in synthesizing representations of generative world models—models of an agent’s environment that can be used to simulate “imagined” scenarios, in which an agent can test policies and learn without being subject to rewards or punishments in its actual environment. Such models can be used to generate increasingly complex or challenging scenarios to allow learning to be scaffolded via a useful “curriculum.” Using deep neural networks to learn and then generate such models has resulted in progress in reinforcement learning’s generality and speed of learning.10

Common Sense

These recent approaches attempt to make AI systems more general by enabling them to learn from a small number of examples, learn multiple tasks in a continual way without inter-task interference, and learn in a self-supervised or intrinsically motivated way. While these approaches have shown promise on several restricted domains, such as learning to play a variety of video games, they are still only early steps in the pursuit of general AI. Further research is needed to demonstrate that these methods can scale to the more diverse and complex problems the real world has to offer.

An important missing ingredient, long sought in the AI community, is common sense. The informal notion of common sense includes several key components of general intelligence that humans mostly take for granted, including a vast amount of mostly unconscious knowledge about the world, an understanding of causality (what factors cause events to happen or entities to have certain properties), and an ability to perceive abstract similarities between situations—that is, to make analogies.11 Recent years have seen substantial new research, especially in the machine-learning community, in how to imbue machines with common sense abilities.12 This effort includes work on enabling machines to learn causal models13 and intuitive physics,14 describing our everyday experience of how objects move and interact, as well as to give them abilities for abstraction and analogy.15

AI systems still remain very far from human abilities in all these areas, and perhaps will never gain common sense or general intelligence without being more tightly coupled to the physical world. But grappling with these issues helps us not only make progress in AI, but better understand our own often invisible human mechanisms of general intelligence.


[1] Brenden M. Lake,  Ruslan Salakhutdinov, Joshua B. Tenenbaum, “Human-level concept learning through probabilistic program induction,” Science  11 Dec 2015: Vol. 350, Issue 6266, pp. 1332-1338

[2]  Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, “Attention Is All You Need,” 31st Conference on Neural Information Processing Systems (NIPS 2017), https://arxiv.org/pdf/1706.03762.pdf 

[3] https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology

[4] Chelsea Finn, Pieter Abbeel, and Sergey Levine, “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks,” https://arxiv.org/abs/1703.03400

[5] Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sable ́-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, and Joshua B. Tenenbaum, “DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning,” https://arxiv.org/pdf/2006.08381.pdf

[6] Shawn Beaulieu, Lapo Frati, Thomas Miconi, Joel Lehman, Kenneth O. Stanley, Jeff Clune, Nick Cheney, “Learning to Continually Learn,” https://arxiv.org/abs/2002.09571v2 

[7] David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature volume 529, 484-489 (2016)

[8] Gianluca Baldassarre and Marco Mirolli, editors, “Intrinsically Motivated Learning in Natural and Artificial Systems,” https://link.springer.com/content/pdf/10.1007%2F978-3-642-32375-1.pdf, page 17; and Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos, “Unifying Count-Based Exploration and Intrinsic Motivation,” https://arxiv.org/abs/1606.01868

[9] Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell, “Curiosity-driven Exploration by Self-supervised Prediction,” Proceedings of the 34th International Conference on Machine Learning, http://proceedings.mlr.press/v70/pathak17a.html; and Cédric Colas, Pierre Fournier, Olivier Sigaud, Mohamed Chetouani, Pierre-Yves Oudeyer, “CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning,” https://arxiv.org/abs/1810.06284v4; and Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Zhaohan Daniel Guo, Charles Blundell, “Agent57: Outperforming the Atari Human Benchmark,” Proceedings of the 37th International Conference on Machine Learning, http://proceedings.mlr.press/v119/badia20a.html

[10] David Ha, Jürgen Schmidhuber, “Recurrent World Models Facilitate Policy Evolution,” https://arxiv.org/abs/1809.01999; and Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley, “Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions,” https://arxiv.org/abs/1901.01753v3; and Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba, “Mastering Atari with Discrete World Models,” https://arxiv.org/abs/2010.02193v3

[11] Ernest Davis and Gary Marcus, “Commonsense reasoning and commonsense knowledge in artificial intelligence,” Communications of the ACM, Volume 58, Issue 9, September 2015  pp 92–103; and Dedre Gentner and Kenneth D. Forbus, “Computational models of analogy,” WIREs Cognitive Science, Volume2, Issue3, May/June 2011, Pages 266-276

[12] Yixin Zhu, Tao Gao, Lifeng Fan, Siyuan Huang, Mark Edmonds, Hangxin Liu, Feng Gao, Chi Zhang, Siyuan Qi, Ying Nian Wu, Joshua B. Tenenbaum, and Song-Chun Zhu, “Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense,” https://arxiv.org/abs/2004.09044

[13]  Judea Pearl, “Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution,” https://arxiv.org/abs/1801.04016

[14] Kevin A. Smith,  Lingjie Mei, Shunyu Yao, Jiajun Wu,  Elizabeth Spelke, Joshua B. Tenenbaum, and Tomer D. Ullman, “Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations,” 33rd Conference on Neural Information Processing Systems, http://www.mit.edu/~k2smith/pdf/Smith_et_al-2019-Modeling_Expectation_Violation.pdf[

[15] Melanie Mitchell, “Abstraction and Analogy-Making in Artificial Intelligence,” https://arxiv.org/abs/2102.10717v2

 

Cite This Report

Michael L. Littman, Ifeoma Ajunwa, Guy Berger, Craig Boutilier, Morgan Currie, Finale Doshi-Velez, Gillian Hadfield, Michael C. Horowitz, Charles Isbell, Hiroaki Kitano, Karen Levy, Terah Lyons, Melanie Mitchell, Julie Shah, Steven Sloman, Shannon Vallor, and Toby Walsh. "Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100) 2021 Study Panel Report." Stanford University, Stanford, CA, September 2021. Doc: http://ai100.stanford.edu/2021-report. Accessed: September 16, 2021.

Report Authors

AI100 Standing Committee and Study Panel 

Copyright

© 2021 by Stanford University. Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100) 2021 Study Panel Report is made available under a Creative Commons Attribution-NoDerivatives 4.0 License (International): https://creativecommons.org/licenses/by-nd/4.0/.