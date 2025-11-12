Guest post series from *Carlo Graziani.

On Artificial Intelligence

What follows is the third part of a seven-installment series on Artificial Intelligence (AI).

The plan is to release one of these per week, on Wednesdays (skipping Thanksgiving week), with the Artificial Intelligence tag on all the posts, to assist people in staying with the plot.

Part 3: There Is No Artificial General Intelligence Down This Road

This week and next we will be taking a close look at the claims made by the Tech industry that there are already indications that Artificial General Intelligence (AGI) is “emerging” in large language models (LLMs), and that true AGI will be a reality within the next few years. Keep in mind that AGI is the objective that these companies are targeting, and its realization is the essential justification for the roughly $2T investments in “AI” model development that the industry now projects over the next 5 years or so.

You might think that to justify that level of investment would require a pretty airtight scientific case that (1) AGI is possible in principle, and (2) that AGI is achievable through current LLM technology, which is to say, using transformer-based deep learning (DL). But if you did think that, you would be wrong. Whether AGI can be accomplished at all has been an open question since the 1930s. And, as I will argue in this essay, we are certainly not any closer to AGI with current “AI” tech than we were before the DL revolution began.

The Circular Argument For AGI

The first thing to observe is that there does not really exist a scientifically-defensible definition of what AGI is. There is a fairly balanced review of the topic here. The principal problem is that we don’t even know how to accurately describe or define either the mechanisms or the characteristics of human intelligence, so when definitions of AGI appeal to notions such as “the ability of computers to perform human-like cognitive tasks” they are comparing one imprecise notion to a different imprecise notion.

Moreover, it is important to note that all such definitions are circular: they define AGI in an LLM in terms of certain types of output produced by LLMs, and then promptly discover evidence for that very output, proving that AGI is near. This paper, Sparks of Artificial General Intelligence: Early experiments with GPT-4 is an unintentionally hilarious example of the genre.

I find this sort of thing extremely frustrating. Language matters in science. I don’t want to have to parse statements that amount to defining what intelligence looks like in text output, from people who don’t have the faintest idea what intelligence is.

Cognitive scientists also labor under this constraint, designing tests and experiments to try to understand aspects of human cognition from stimuli and responses. But they have no choice in the matter: we are very far away from having experimental access to the higher-level functioning of the human brain, so those scientists use the tools that are available. Computer scientists have no such excuse: they have complete access to and control over their models. Nonetheless, the tests for intelligence that they adopt are essentially stylized versions of the cognitive science tests, with stimulus and response replaced by prompt and response. There is no effort to describe what aspect of transformers (or the chained, augmented transformers in the “reasoning” models of OpenAI and others). There is only complacent satisfaction that some combination of pre-training, fine-tuning, distillation, computational scaling, iteration, etc. produces improved performance on “reasoning” benchmarks. Sure, that’s very nice, although “improved” does not mean “adequate”, according to ARC-AGI-2 testing. But excuse me, what isthis “reasoning” of which you speak?

I’ll have more to say about reasoning next week. For now, I just want to point out that whatever reasoning is, it is certainly a distinct cognitive process from learning. So the assertion that reason can “emerge” from what are pure statistical learning systems is a huge claim, one whose justification would require mountains of really impressive scientific evidence, including a detailed explanation of the mechanism by which it arises in LLMs or chains of LLMs.

The Implausibility of “Learning To AGI”

In order to break down the claim into intelligible pieces, it is useful to adopt the “model-agnostic” outlook on machine learning that I discussed last week. Recall that in that outlook, we draw a veil over the details of the machine learning implementation, and focus on learned distributional structure of training data and on optimality of decision choice. In this case, the training data is vast amounts of text distilled and cleaned and curated, from large-scale Internet scrapes, from large libraries of scanned books, from academic journals, and so on. The decisions are responses to prompts. Whatever the thing behind the veil is, what it does is learn an approximation to the distribution of texts, and approximately optimal responses to prompts.

I need to introduce a concept here that is familiar to most scientists: it is the idea of an inverse problem. The problem is this: given that some data resulting from observations of some process, infer certain attributes of that process. A simple example is weather prediction: given a time-series of observations of weather conditions at thousands of weather stations, and radar and other remote observations, recover an approximation of the current full state of the atmosphere, so as to evolve it using a numerical weather model to predict whether it will rain tomorrow. Another famous (and essentially unsolved) example is from epidemiology: given some time-series of data on infections, hospitalizations, and deaths due to COVID-19, say, infer the current state of the epidemic (how many people are susceptible, exposed, infected, recovered, immune, on a county-by-county basis), and use a numerical epidemiological model predict the epidemic’s future course.

Note the essential elements of such problems: we have a principled model of the process (a numerical weather model, or an epidemiological model) whose state we would like to infer (the atmospheric state, or the state of the epidemic) using data (weather observations, clinical data) so as to make predictions (will it rain during my picnic, is there a new epidemic wave in progress). There is always an assumed “forward model” that describes how the observed data arises, given the state of the process. But that state is unknown, and to estimate it from data one must in some sense “invert” the forward model. Hence “Inverse Problem”.

The process model plays a key role. You need to have some idea of how the process works—a set of equations that governs the process, for example, depending on unknown parameters that you need to infer—for there to even be a well-posed inverse problem. That’s not a sufficient condition, but it is certainly a necessary one.

Inverse problems are ubiquitous in science. In fact, one could, after a few beers, make the claim that most of the daily activities of scientists revolve around solving inverse problems. This is not completely true (where did the principled process models come from, in the first place?) but it is not a grotesque caricature either.

We can view the training of an “AGI” in inverse problem terms: the data is the oceans of text that these things ingest. The process model is the transformer-based “reasoning” model. The “state” to be inferred is the parameter configuration of that model that closely corresponds to a representation of the mental state of a reasoning human. The predictions are reasoned responses to prompts.

OK that’s all I need. Here is the problem: in order to believe that LLMs are achieving “reason” (the minimum requirement for any definition of AGI), we need to accept two big claims:

Whatever a reasoning process may be, it leaves a sufficiently informative imprint of its internal state in text data, such that the state may be in some sense recovered and exploited, given a sufficiently large corpus of text, by solving the corresponding inverse problem. Transformer-based LLMs, in some sense, play the role of the process model in this inverse problem, and training such an LLM is tantamount to solving the inverse problem. Moreover, the trained LLM embodies the resulting reasoning entity to the point that at inference time it actually reasons.

Let’s take these in order:

In my opinion, claim (1) is barely sane. Perform any sort of introspection, and I think it is likely that you will find that your spoken or written utterances embody only the most superficial layers of your reason and other cognitive processes. That’s why we all struggle to put our thoughts into words when the occasion arises. We often are not even clear about what our thoughts are, and find, after putting them into words, that they have changed, possibly getting clearer, but also often becoming murkier and less certain as we are forced to articulate our meaning .

I simply cannot understand how such subrational processes might embed any interpretable information in our utterances. It is analogous to believing that, given a full, principled model of human physiology, and a data corpus of human footprints together with clinical observations of the humans leaving the footprints, one could train a model that could observe a new footprint and predict the health of the corresponding human. That would be mad: there is not enough information embedded in a footprint to back out a person’s gastric health, or vision acuity, or state of infection from a disease, etc. Similarly, I do not believe that there is enough information impressed in text about the subrational processes whose surface manifestations we call “reason”. I could be wrong about this, but I don’t think so, and in any event the burden of proof is on those researchers who make this kind of claim. Where is that information? How is it encoded?

Claim (2) is actually much worse: it is in the category that physicist Wolfgang Pauli called “not even wrong”—a statement so detached from scientific discourse that classifying it as correct or incorrect is simply a waste of time.

Let’s pull back the curtain concealing the LLM model for a moment. If you read any of the many online descriptions of how a transformer works (The Illustrated Transformer is pretty good, and Wikipedia’s is quite detailed, but Google has many hits for “How does a transformer LLM work”), you may find the level of computational detail off-putting at first. But if you zoom out a bit, what you realize is that it is mostly a giant chain of linear-algebraic operations, interspersed with a few nonlinear “activations”, sandwiched between a linear encoding layer and a nonlinear decoding layer. In this sense it is not different from any DL method. There’s more layers and parameter arrays than most, but not much more structure. It’s a system that grew out of a lot of trial and error, with a pile of late, unlamented errors filling a large dumpster in the back of the lab, and only what more-or-less worked left in.

There is nothing special in that model that is analogous, say, to the model of human physiology that one would need to even attempt to back out a human’s health from that human’s footprint. There isn’t a scrap of theory to motivate the claim that transformer-based models could furnish the basis for solving this inverse problem. Which is to say, a key element of the inverse problem—the principled model embodying actual knowledge of the process under study—is simply not there. Instead there are chains of linear algebra mingled with other ad-hockery, not purporting to model anything. Which means that Claim (2), is in effect, not only that this Rube Goldberg device is capable of inverting the forward model to recover the reasoning process state, but it is also somehow capable of reconstructing the principled model of the reasoning process of which that state is an attribute. That chain of linear algebra is, in effect, a Nobel-caliber cognitive scientist, because the first reasoning task that it carries out is to create a working model of reason itself, a task that still eludes the discipline of cognitive science!

That is just magical thinking. It is literally impossible that this bodged-together system should have accidentally succeeded in modeling reason—an unsolved scientific problem—and then solving the related, probably impossible inverse problem of recovering the model’s state from text input, so as to boot up a reasoning entity. It’s a thoroughgoingly stupid claim.

“AGI” Is A Scientific Scandal

I find it disgraceful and shameful that an entire category of scientists has been moved by enthusiasms and Tech industry funding to lower its intellectual standards to the point that this sort of bullshit floods the journal and conference literature. It’s a scientific scandal, unfolding in plain view. Nothing in the Replication Crisis that afflicts the social sciences comes remotely close to this level of corrupted science.

I can’t emphasize strongly enough that this hubristic nonsense is taken very seriously by the “AI” research community. Sublimely unfazed by the absence of any fundamental explicit understanding of what reason is, and positively glorying in the inscrutable inner complexity of LLMs (“Explainability” is itself a topic for funded research, after all, as we saw last week), this community crows about achieving the “emergence” of intelligence from the models at large scales of data and computation, secure in the knowledge that the models are too unanalyzably complex for any model developer to be expected to explain how this miracle comes about. They just claim that it’s “self-organization” at work. The intellectual laziness of this outlook is simply shocking to me.

At this point, the technical jargon of this discipline has escaped all bounds of propriety. “AI” was bad enough, given the limited amount of “I” in ML (basically, only learning). But now we have “chain of thought”, “knowledge representations”, “mixture of experts”, “agents”, “reasoning models” and “General Intelligence” as well as many other similar allusions to human cognition polluting the technical discourse. Shame is dead in this discipline.

In a sense it’s kind of funny: Silicon Valley Masters of the Universe are directing trillions of dollars in investments to build hundreds of data centers, buy stupefying amounts of computing hardware, and add an estimated 60GW of electical power generation to the U.S. grid, all for the purpose of achieving something that literally cannot be achieved. There is no pot of gold marked “AGI” at the end of this rainbow. But it will take an infinite amount of data, compute, power, effort, and money to get there and find out. What could possibly go wrong?