Can deep learning systems learn to manipulate symbols? The answers might change our understanding of how intelligence works and what makes humans unique.
If there is one constant in the field of artificial intelligence it is exaggeration: there is always breathless hype and scornful naysaying. It is helpful to occasionally take stock of where we stand.
The dominant technique in contemporary AI is deep learning (DL) neural networks, massive self-learning algorithms which excel at discerning and utilizing patterns in data. Since their inception, critics have prematurely argued that neural networks had run into an insurmountable wall — and every time, it proved a temporary hurdle. In the 1960s, they could not solve non-linear functions. That changed in the 1980s with backpropagation, but the new wall was how difficult it was to train the systems. The 1990s saw a rise of simplifying programs and standardized architectures which made training more reliable, but the new problem was the lack of training data and computing power.
In 2012, when contemporary graphics cards could be trained on the massive ImageNet dataset, DL went mainstream, handily besting all competitors. But then critics spied a new problem: DL required too much hand-labelled data for training. The last few years have rendered this criticism moot, as self-supervised learning has resulted in incredibly impressive systems, such as GPT-3, which do not require labeled data.
Today’s seemingly insurmountable wall is symbolic reasoning, the capacity to manipulate symbols in the ways familiar from algebra or logic. As we learned as children, solving math problems involves a step-by-step manipulation of symbols according to strict rules (e.g., multiply the furthest right column, carry the extra value to the column to the left, etc.). Gary Marcus, author of “The Algebraic Mind”and co-author (with Ernie Davis) of “Rebooting AI,”recently argued that DL is incapable of further progress because neural networks struggle with this kind of symbol manipulation. By contrast, many DL researchers are convinced that DL is already engaging in symbolic reasoning and will continue to improve at it.
At the heart of this debate are two different visions of the role of symbols in intelligence, both biological and mechanical: one holds that symbolic reasoning must be hard-coded from the outset and the other holds it can be learned through experience, by machines and humans alike. As such, the stakes are not just about the most practical way forward, but also how we should understand human intelligence — and, thus, how we should pursue human-level artificial intelligence.
Kinds Of AI
Symbolic reasoning demands precision: symbols can come in a host of different orders, and the difference between (3-2)-1 and 3-(2-1) is important, so performing the right rules in the right order is essential. Marcus contends this kind of reasoning is at the heart of cognition, essential for providing the underlying grammatical logic to language and the basic operations underlying mathematics. More broadly, he holds this extends into our more basic abilities, where there is an underlying symbolic logic behind causal reasoning and reidentifying the same object over time.
The field of AI got its start by studying this kind of reasoning, typically called Symbolic AI, or “Good Old-Fashioned” AI. But distilling human expertise into a set of rules and facts turns out to be very difficult, time consuming, and expensive. This was called the “knowledge acquisition bottleneck.” While simple to program rules for math or logic, the world itself is remarkably ambiguous, and it proved impossible to write rules governing every pattern or define symbols for vague concepts.
This is precisely where neural networks excel: discovering patterns and embracing ambiguity. Neural networks are a collection of relatively simple equations that learn a function designed to provide the appropriate output for whatever is inputted to the system. For example, training a visual recognition system will ensure all the chair images cluster together, allowing the system to tease out the vague set of indescribable properties of such an amorphous category. This allows the network to successfully infer whether a new object is a chair, simply by how close it is to the cluster of other chair images. Doing this with enough objects and with enough categories results in a robust conceptual space, with numerous categories clustered in overlapping but still distinguishable ways.
“At stake are questions not just about contemporary problems in AI, but also questions about what intelligence is and how the brain works.”
These networks can be trained precisely because the functions implemented are differentiable. Put differently, if Symbolic AI is akin to the discrete tokens used in symbolic logic, neural networks are the continuous functions of calculus. This allows for slow, gradual progress by tweaking the variables slightly in the direction of learning a better representation — meaning a better fit between all the data points and the numerous boundaries the function draws between one category and another. This fluidity poses problems, however, when it comes to strict rules and discrete symbols: when we are solving an equation, we usually want the exact answer, not an approximation.
Since this is where Symbolic AI shines, Marcus recommends simply combining the two: inserting a hard-coded symbolic manipulation module on top of a pattern-completion DL module. This is attractive since the two methods complement each other well, so it seems plausible a “hybrid” system with modules working in different ways would provide the best of both worlds. And it seems like common sense, since everyone working in DL agrees that symbolic manipulation is a necessary feature for creating human-like AI.
But the debate turns on whether symbolic manipulation needs to be built into the system, where the symbols and capacity for manipulating are designed by humans and installed as a module that manipulates discrete symbols and is consequently non-differentiable — and thus incompatible with DL. Underlying this is the assumption that neural networks can’t do symbolic manipulation — and, with it, a deeper assumption about how symbolic reasoning works in the brain.
Symbolic Reasoning In Neural Nets
This assumption is very controversial and part of an older debate. The neural network approach has traditionally held that we don’t need to hand-craft symbolic reasoning but can instead learn it: training a machine on examples of symbols engaging in the right kinds of reasoning will allow it to be learned as a matter of abstract pattern completion. In short, the machine can learn to manipulate symbols in the world, despite not having hand-crafted symbols and symbolic manipulation rules built in.
Contemporary large language models — such as GPT-3 and LaMDA — show the potential of this approach. They are capable of impressive abilities to manipulate symbols, displaying some level of common-sense reasoning, compositionality, multilingual competency, some logical and mathematical abilities, and even creepy capacities to mimic the dead. If you’re inclined to take symbolic reasoning as coming in degrees, this is incredibly exciting.
But they do not do so reliably. If you ask DALL-E to create a Roman sculpture of a bearded, bespectacled philosopher wearing a tropical shirt, it excels. If you ask it to draw a beagle in a pink harness chasing a squirrel, sometimes you get a pink beagle or a squirrel wearing a harness. It does well when it can assign all the properties to a single object, but it struggles when there are multiple objects and multiple properties. The attitude of many researchers is that this is a hurdle for DL — larger for some, smaller for others — on the path to more human-like intelligence.
“Does symbolic manipulation need to be hard-coded, or can it be learned?”
However, this is not how Marcus takes it. He broadly assumes symbolic reasoning is all-or-nothing — since DALL-E doesn’t have symbols and logical rules underlying its operations, it isn’t actually reasoning with symbols. Thus, the numerous failures in large language models show they aren’t genuinely reasoning but are simply going through a pale imitation. For Marcus, there is no path from the stuff of DL to the genuine article; as the old AI adage goes, you can’t reach the Moon by climbing a big enough tree. Thus he takes the current DL language models as no closer to genuine language than Nim Chimpsky with his few signs of sign-language. The DALL-E problems aren’t quirks of a lack of training; they are evidence the system doesn’t grasp the underlying logical structure of the sentences and thus cannot properly grasp how the different parts connect into a whole.
This is why, from one perspective, the problems of DL are hurdles and, from another perspective, walls. The same phenomena simply look different based on background assumptions about the nature of symbolic reasoning. For Marcus, if you don’t have symbolic manipulation at the start, you’ll never have it.
By contrast, people like Geoffrey Hinton contend neural networks don’t need to have symbols and algebraic reasoning hard-coded into them in order to successfully manipulate symbols. The goal, for DL, isn’t symbol manipulation inside the machine, but the right kind of symbol-using behaviors emerging from the system in the world. The rejection of the hybrid model isn’t churlishness; it’s a philosophical difference based on whether one thinks symbolic reasoning can be learned.
The Nature Of Human Thought
Marcus’s critique of DL stems from a related fight in cognitive science (and a much older one in philosophy) concerning how intelligence works and, with it, what makes humans unique. His ideas are in line with a prominent “nativist” school in psychology, which holds that many key features of cognition are innate — effectively, that we are largely born with an intuitive model of how the world works.
A central feature of this innate architecture is a capacity for symbol manipulation (though whether this is found throughout nature or whether it is human-specific is debated). For Marcus, this symbol manipulation capacity grounds many of the essential features of common sense: rule-following, abstraction, causal reasoning, reidentifying particulars, generalization and a host of other abilities. In short, much of our understanding of the world is given by nature, with learning as a matter of fleshing out the details.
There is an alternate, empiricist view which inverts this: symbolic manipulation is a rarity in nature, primarily arising as a learned capacity for communicating acquired gradually by our hominin ancestors over the last two million years. On this view, the primary cognitive capacities are non-symbolic learning abilities bound up with improving survival, such as rapidly recognizing prey, predicting their likely actions, and developing skillful responses. This assumes that the vast majority of complex cognitive abilities are acquired through a general, self-supervised learning capacity, one that acquires an intuitive world-model capable of the central features of common sense through experience. It also assumes that most of our complex cognitive capacities do not turn on symbolic manipulation; they make do, instead, with simulating various scenarios and predicting the best outcomes.
“The inevitable failure of deep learning has been predicted before, but it didn’t pay to bet against it.”
This empiricist view treats symbols and symbolic manipulation as simply another learned capacity, one acquired by the species as humans increasingly relied on cooperative behavior for success. This regards symbols as inventions we used to coordinate joint activities — things like words, but also maps, iconic depictions, rituals and even social roles. These abilities are thought to arise from the combination of an increasingly long adolescence for learning and the need for more precise, specialized skills, like tool-building and fire maintenance. This treats symbols and symbolic manipulations as primarily cultural inventions, dependent less on hard wiring in the brain and more on the increasing sophistication of our social lives.
The difference between these two views is stark. For the nativist tradition, symbols and symbolic manipulation are originally in the head, and the use of words and numerals are derived from this original capacity. This view attractively explains a whole host of abilities as stemming from an evolutionary adaptation (though proffered explanations for how or why symbolic manipulation might have evolved have been controversial). For the empiricist tradition, symbols and symbolic reasoning is a useful invention for communication purposes, which arose from general learning abilities and our complex social world. This treats the internal calculations and inner monologue — the symbolic stuff happening in our heads — as derived from the external practices of mathematics and language use.
The fields of AI and cognitive science are intimately intertwined, so it is no surprise these fights recur there. Since the success of either view in AI would partially (but only partially) vindicate one or the other approach in cognitive science, it is also no surprise these debates are intense. At stake are questions not just about the proper approach to contemporary problems in AI, but also questions about what intelligence is and how the brain works.
What The Stakes Are — And Aren’t
The high stakes explain why claims that DL has hit a wall are so provocative. If Marcus and the nativists are right, DL will never get to human-like AI, no matter how many new architectures it comes up with or how much computing power it throws at it. It is just confusion to keep adding more layers, because genuine symbolic manipulation demands an innate symbolic manipulator, full stop. And since this symbolic manipulation is at the base of several abilities of common sense, a DL-only system will never possess anything more than a rough-and-ready understanding of anything.
By contrast, if DL advocates and the empiricists are right, it’s the idea of inserting a module for symbolic manipulation that is confused. In that case, DL systems are already engaged in symbolic reasoning and will continue to improve at it as they become better at satisfying constraints through more multimodal self-supervised learning, an increasingly useful predictive world-model, and an expansion of working memory for simulating and evaluating outcomes. Introducing a symbolic manipulation module would not lead to more human-like AI, but instead force all “reasoning” operations through an unnecessary and unmotivated bottleneck that would take us further from human-like intelligence. This threatens to cut off one of the most impressive aspects to deep learning: its ability to come up with far more useful and clever solutions than the ones human programmers conceive of.
As big as the stakes are, though, it is also important to note that many issues raised in these debates are, at least to some degree, peripheral. There are sometimes claims that the high-dimensional vectors in DL systems should be treated like discrete symbols (probably not), whether the lines of code needed to implement a DL-system make it a “hybrid” system (semantics), whether winning at complex games requires hand-crafted, domain-specific knowledge or whether it can learn it (too soon to tell). There’s also a question of whether hybrid systems will help with the ethical problems surrounding AI (no).
And none of this is to justify the silliest bits of hype: current systems aren’t conscious, they don’t understand us, reinforcement learning isn’t enough, and you can’t build human-like intelligence just by scaling up. But all these issues are peripheral from the main debate: does symbolic manipulation need to be hard-coded, or can it be learned?
Is this a call to stop investigating hybrid models (i.e., models with a non-differentiable symbolic manipulator)? Of course not. People should go with what works. But researchers have worked on hybrid models since the 1980s, and they have not proven to be a silver bullet — or, in many cases, even remotely as good as neural networks. More broadly, people should be skeptical that DL is at the limit; given the constant, incremental improvement on tasks seen just recently in DALL-E 2, Gato, and PaLM. it seems wise not to mistake hurdles for walls. The inevitable failure of DL has been predicted before, but it didn’t pay to bet against it.