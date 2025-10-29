Guest post series from *Carlo Graziani.

(Editor’s note: It’s Carlo, no S.)

On Artificial Intelligence

Hello, Jackals. Welcome, and thank you for this opportunity. What follows is the first part of a seven-installment series on Artificial Intelligence (AI).

Who Am I, And Why Do I Think I Have Something To Say About AI?

I am a computational scientist, applied mathematician, and statistician working at a U.S. National Laboratory. As such, I must of course make an obligatory disclaimer: The views discussed in these essays are my own, and in no way represent the views of my employer, or of any part of the U.S. government.

I work on many projects, quite a few of which have been intimately connected with the subject of machine learning (ML). ML is the technical term for the set of computational/statistical techniques that underlie the subject of AI. Over the course of the past several years, I have been looking into some issues at the mathematical foundations of the subject. In the process, I have developed a somewhat peculiar outlook on AI—at least, some of my colleagues regard it as somewhat peculiar, but often, at the same time, refreshing.

I’ve come to a set of conclusions about AI that have gelled into a relatively coherent story. It is to some degree a technical story, but I believe that it is very relevant to the moment that we are living through, because of the sudden onset of public AI tools, and because of the ubiquity of AI in public discourse and policymaking. I would like to tell that story in a manner that is as accessible as possible to non-specialists and non-technical people, because I believe that many wrong—even unscientific—claims are being injected into our societal discourse and peddled as facts. The subject is simply too inscrutable to many otherwise intelligent and literate people for those claims to be gainsaid. Nonetheless, those claims should be gainsaid, because they are technically wrong, and their wrongness has very real implications for where we are heading with this technology.

These essays are my attempt to tell that story.

Overview of the Series

Here is the general plan for this series:

Part 1: What is AI, And How Did We Get Here An introduction to the subject, and a review of its recent history

Part 2: AI State of Play A framework for reasoning about AI, and an examination of the scientific culture of AI research

Part 3: There Is No Artificial General Intelligence Down This Road Claims of AI cognition: On the implausibility of most, and on scientific impossibility of the others

Part 4: If AGI Were Possible, What Would It Look Like? Wherein we address the questions that AI researchers ought to confront to make claims of AGI scientifically defensible.

Part 5: Hallucinations Why do large language models (LLMs) frequently produce crazy output? A likely explanation that bodes poorly for prospects of eliminating hallucinations

Part 6: The Pathology At The Heart Of Hyperscaling What is it that drives AI development to ocean-boiling scales of compute and power consumption? Should it really be necessary to scale models out in this way?

Part 7: AI Winter III in 5…4…3… Pulling together conclusions from the previous essays and some deeper history of AI, we look at the the current state of expectations for industrial-scale AI development, and try to understand what the future holds.



The plan is to release one of these per week, on Wednesdays (skipping Thanksgiving week), with the Artificial Intelligence tag on all the posts, to assist people in staying with the plot.

OK, those are the preliminaries. Without further ado, then:

Part 1: What Is AI, and How Did We Get Here

The subject of AI is very clearly topical and important. It is also necessarily a nearly opaque topic to many people, not least because the advent of AI services for the general public has occurred with such suddenness as to catch most people by surprise. Many, probably the majority of you have at least messed around with a Chatbot, and quite a few of you have used one to help you write documents or even computer code, and been surprised at how useful these tools can be. It is natural to wonder whether the entity at the other end of the prompt-response dialog is, in fact, an intelligent interlocutor. How is this possible, and how did it happen so quickly?

Another difficulty for coming to grips with AI is that many bold, almost incredible claims have been made on behalf of the capabilities of AI models. The principal such claim is embodied in the now-ubiquitous term “AI” itself: the “I” stands for “Intelligence”, and every time we use the term “AI” we are, in a way, conceding a point: that we now have machine systems that are in some sense “intelligent”, and hence may be said to engage in some form of cognition. The technical literature on AI is replete with anthropomorphizing terms such as “general intelligence”, “chain of thought”, “mixture of experts”, “agents”, and so on, apparently describing recognizable cognitive features and abilities of these machine systems.

This idea—that the age of thinking machines is upon us—is at the heart of much public writing on AI, some celebratory, some bemused, some frankly anxious. AI systems have been touted as labor-replacing devices in a variety of sectors: administration, entertainment, engineering, programming, even science. Every military in the world is looking at AI-guided autonomous weaponry and decision-making systems. And in Silicon Valley, many Captains of Industry are frankly gleeful at the disruption to various economic enterprises of their new intelligent machines, anticipating that the time is nigh when each disrupted sector of the economy will reconfigure itself to direct part of its profits to those supplying the AI tools.

On Learning

Whether these machines can in fact be said to be “intelligent” in even the most rudimentary ways is a question that has received remarkably little serious scientific investigation. The claims that one often encounters in the AI literature (such as those made in this egregiously-titled paper) on behalf of the proposition that large language models (LLMs) can “reason” don’t really meet any rigorous standard of scientific inquiry: they amount to circular reasoning, as I will discuss in Parts 3 and 4 of this series.

In fact, there is only one aspect of cognition that “AI” systems can be said to model: Learning. The very term “AI” is essentially a marketing cover for the subject of machine learning (ML), the most prominent part of which nowadays being deep learning (DL), a variant of ML that employs models consisting of artificial neural networks. To be clear, there are also other important modeling strategies in ML besides DL approaches, but it was the advent of DL in 2007 that set off the technical revolution whose consequences we see today.

Machine Learning is the proper technical term for “AI”, and from here on I will attempt to use that term in preference to “AI”, using the latter only in scare quotes. “AI” is a technically-obnoxious term, because that “I” introduces improper associations with general cognition that, I hope to persuade you, we should view as inadmissible.

Statistical Learning

OK, so what is machine learning then? ML is a field within an even older subject called statistical learning, which is concerned with using data to structure optimal decisions. To be more explicit, here is the program of statistical learning:

Take a set of data, and infer an approximation to the statistical distribution from which the data was sampled; Data could be images, weather states, protein structures, text… At the same time, optimize some decision-choosing rule in a space of such decisions, exploiting the learned distribution; A decision could be the forecast of a temperature, or a label assignment to a picture, or the next move of a robot in a biochemistry lab, or a policy, or a response to a text prompt… Now when presented with a new set of data samples, produce the decisions appropriate to each one, using facts about the data distribution and about the decision optimality criteria inferred in parts 1. and 2.

These three things really capture everything about what “AI” really is about, from the humblest convolutional neural network to the mightiest Chatbot. You may want to re-read them, because they structure much of the story to follow.

Learning vs. Cognition

Parts 1. and 2. of this program are referred to as Training. Part 3. is called Inference. The entire process is described using the term Learning. This is supposed to be an evocative term: in a sense, a system that accomplishes this program “learns” from the data how to make reasonable decisions in the presence of new data. The analogy to human “learning” (or to animal “learning” for that matter) seems sufficiently well-motivated to justify using the term here

Note, however, that learning is a very limited part of cognition, and in particular, statistical learning does not embody anything that one might analogize to, say, reasoning. This is an important reason for being wary of the term “Intelligence” in this connection: there is a great deal more to intelligence than learning. We will return to this point with some force in future essays.

Machine Learning

How does ML differ from statistical learning? Not in any essentials. The only difference is that ML arose in conjunction with the advent of powerful computing tools that could hoover up and process massive amounts of data. ML is basically “Statistical Learning Meets High-Performance Computing”.

I don’t want the obviously “AI”-skeptical take that I am developing here to obscure an important truth: ML techniques have turned out to be an unbelievably powerful family of methods for assimilating massive amounts of data, and structuring reasonable (if not always optimal) decisions based on that data. The ability to do this at large scales has been transformative to many aspects of our lives, as well as to the scientific enterprise itself. When that marriage of statistical learning and computing occurred, something deep and important changed in the world.

A Brief History of Machine Learning

The history of ML methods, including that of neural networks and other efforts to bring about machine intelligence, has antecedents that go back to the 1940s. For now, I’m going to start the story quite a bit later than that, because to go back that far would take us a bit far afield. For present purposes I need to describe the beginning of a revolution that really got underway around 2007 .

This was a curious time in the academic disciplines of applied mathematics and statistics, because it was already clear that high-performance computing was having a transformative effect on science, but at the same time there were many problems still regarded as very hard, possibly insoluble even with the new computational tools.

Manifold-Finding

One example of such problems can serve to illustrate the broader situation. Suppose that we have many data samples, each consisting of a long list of numbers (in mathspeak, such a list is a “vector”). Suppose, for example, that each such vector contains 1000 entries. The problem is this: is it possible to identify a shorter list of, say, 30 entries, together with 1000 functions of those 30 entries, sufficient to largely recover the original structure of the data, and (importantly) to predict future data? And are there efficient ways to accomplish this?

If that formulation seems confusing to you, perhaps the following analogy is helpful: think of an aircraft condensation trail, one of those visible vapor trails left behind by passenger jets. Each location in the trail can be described by three numbers, its Cartesian coordinates (x,y,z). The full trail is described by a connected series of such triplets.

But that is not an efficient representation of the trail, because we know that it is really a curve—a one-dimensional object embedded in a three-dimensional space. It would be much better to know three functions x(t), y(t), z(t) from a one-dimensional parameter, t that embeds the curve into the three dimensional space. Such a representation is much more compact, and moreover it is more informative about the structure of the trail than the series of triplets (x,y,z). The three functions provide a dimensional reduction of the structure of the trail. We can now reason about its structure in a lower-dimensional space. We have also discovered an important feature of the trail: it is really just a line, warped by the embedding functions into a three-dimensional structure.

The case of a one-dimensional structure embedded in a three-dimensional space is not too difficult a problem to visualize or solve. By contrast, the case of the 1000-dimensional vectors to be summarized by a 30-dimensional embedding (where in particular we don’t even know whether 30 is the right embedding dimension a priori) is very hard. This problem, called submanifold finding, is important and ubiquitous. For example, images can be viewed as vectors, with the entries representing the brightness of pixel values. Viewed in this way, the space of images is almost entirely empty of “natural” images: if you selected random values for the pixel brightnesses and displayed the result as an image you would always get oatmeal, without ever producing such features of natural images as an edge, or a contrast gradient, let alone images such as a stop sign or a cat. The space of natural images is a low-dimensional submanifold of the space of images. And because the relevant dimensions are so high, that submanifold is essentially impossible to locate using classical methods. And, if you can’t find that submanifold, you can’t really distinguish images from non-images, so image classification (for example) is basically impossible.

Around 2007, there was not a lot of optimism among applied mathematicians that this problem would be solved anytime soon. I remember reading a review article on submanifold-finding in 2009, which essentially declared defeat: advanced methods were capable of finding such immersed structure for very artificial examples of data in dimensions as risibly low as 50, say, but were helpless to do anything useful with natural data such as the MNIST Hand-Written Digits, which are images consisting of 784 black-and-white pixels.

Enter Deep Learning

This is not a little ironic, because at about this time, Deep Learning methods were making inroads into such problems at amazing rates of progress. The introduction of the convolutional neural network made image analysis almost magically tractable, directly identifying features of images at various scales and exploiting those features for tasks such as classification. By 2012, researchers using a DL architecture called an autoencoder had demonstrated convincingly that the 784-dimensional MNIST data could in fact be represented using a 30-dimensional submanifold. It was a tour-de-force.

Oddly enough, this feat had not been pulled off by applied mathematicians or by statisticians, the academic tribes traditionally most concerned with modeling data. Instead, members of a completely different tribe were responsible: computer scientists. Academic computer science was quick to recognize that the advent of high-performance computing tools enabled rapid processing of data volumes on heretofore unheard-of scales, and that this new capability could transform some older ML techniques from academic toys into useful tools for turning data into decisions. In 2007, certain technical breakthroughs occurred that made the training of large neural network models possible for the first time. The term “Deep Learning” dates to this period, and can serve as a marker for the birth of the revolution.

In the decade-and-a-half that followed, the subject of Deep Learning advanced from strength to strength, furnishing solutions to many problems previously regarded as intractable. Image processing problems such as classification (“Is this a cat?”), or image segmentation (“Is there a cat in this image?”) became easy. Empirical weather forecasting based on DL methods is so effective that the European Center for Medium Range Weather Forecasts (ECMWF) now uses such a model for some of its 10-day forecast products. Protein folding structure, a key to understanding biochemical action of proteins, is now a solvable problem thanks to DL. By 2017, Natural Language Processing (NLP) had already advanced to the point that there was no reason for companies to employ hundreds of customer service call-center operators. Those people could be replaced by a cheap appliance that routed callers around voicemail menus—annoyingly but admittedly with reasonable accuracy. And 2017 saw the invention of the Transformer architecture (the “T” in “GPT”), which eventually would wind up eclipsing all other forms of “AI” in the public mind.

The Academic Politics of Deep Learning

There is an interesting question raised by this capsule history: why was it that the revolution was led by computer science, rather than by academic applied math and statistics? After all, the new field that arose is often referred to as “Data Science”, so why is it that the traditional academic disciplines concerned with data were so outmatched in this revolution?

In my opinion, the answer in the case of statistics is that unfortunately, at the time of the birth of this subject, the majority of academic statisticians were computationally illiterate. Most of them barely knew how to use their computers to send email and edit documents. With very few exceptions, they were entirely innocent of the acceleration of computing capabilities, and hence not in a position to consider the meaning of this development for their own professional activities .

The case of applied mathematics is, in my opinion, more interesting and informative. Applied mathematicians have been computationally literate for many decades, and very much attuned to developments in computing. But they labored under a handicap: mathematical principle.

By and large, the subject of computational applied math is algorithmic. This term has a specific and important meaning: An algorithm is a piece of rigorous mathematics that is realized as a computational model. As such, it makes very precise and validatable predictions. If you implement, say, a linear algebra solver algorithm, you know that there are necessarily inaccuracies in the output solutions that have certain predictable properties. If you run the algorithm and you don’t observe those properties, you know that your algorithm has a problem, and professional canon binds you to go find it and fix it.

And that was the applied math folks’ problem: there isn’t a single algorithm in all of deep learning.

On Model Correctness

I can hear the cries of outrage already. What? Are you nuts? Google Corporate swears by “the Algorithm”! Even avowed foes of AI put the word “Algorithm” in the titles of their tracts! Why are you playing these language games?

Stay with me. There is a Yang to the Yin of “Algorithm”. It is called a heuristic. A heuristic is a bit of semi-mathematical intuition that is embodied in a computer program. Elements of that program can certainly be described by mathematical equations, but the entire structure is not based on any theoretical understanding, and, as a consequence there can be no a priori expectations concerning the program’s output. A heuristic model is considered “valid” if it seems qualitatively acceptable by some performance metrics, and especially if it appears to outperform other heuristics that attack the same problem set.

All DL methods are heuristics. There is not a single exception. That was the key to progress. Researchers in DL simply dropped the epistemological standards of normal computational science, wherein models are considered “correct” if they embody a correct implementation of a rigorous mathematical concept purporting to represent some data-generating process and successfully demonstrate that the predictions of the models are validated by data from that process. In its place, they installed a weaker epistemology, in which a model implemented in computer code no longer needs a mathematical model of the data-generating process at its foundation, and is “valid” if its output seems reasonable compared to some data, with no rigorous prior statement as to what counts as “reasonable”.

As we’ve seen, this strategy was wildly successful, and of necessity it largely excluded applied math and statistics (statisticians also prize rigor) from real participation in the revolution. Oddly enough, there is an analogy here with the development of Financial Mathematics. This is a subject whose practitioners attempt to predict future movements in prices of financial assets, based on time series of past prices. In this endeavor, they make no effort to actually model markets as ensembles of financial actors. Instead, they use completely empirical sets of equations that have no justification beyond their purported ability to predict the price time series. A model is considered “correct” if it can be shown that decisions based on its predictions are profitable. A second model would be “more correct” than the first one if it is more profitable. “Correctness”, in this sense, is very contingent, because changes in market conditions can turn a “correct” model into an “incorrect” model if it starts consistently losing money.

This is more-or-less how DL practitioners assess their models’ correctness: not on how well they embody some mathematical principle that is believed to describe the data-generating process, but simply on how well they appear to match the data at all .

Drawing Conclusions: Costs and Benefits

As I said, adopting this sort of epistemology was key to the progress made in DL. But I hope that it should be clear that this progress came at a cost, and that there might be pigeons in the air waiting to come home to roost. The cost is that no DL model can ever be said to be “wrong”, because we do not have any language to describe what it would mean for a DL model to be “correct”.

Not having a criterion for correctness might be an acceptable situation if the model is making low-stakes decisions. It is not acceptable in high-stakes situations such as ML-driven surgery, or ML-driven air-traffic control, or ML-controlled weaponized drones, or even ML-driven science modeling. If it is not possible, even in principle, to verify and validate such models, how can they be trusted in critical applications?

Viewed in an even broader context, this choice made by DL practitioners almost 20 years ago is now, in my opinion, the underlying source of a Kuhnian scientific crisis. The crisis is best represented by the issue of “AI Hallucinations”, which is widely recognized to be a problem. This is, without a doubt, an issue of inaccurate model output, from models whose design criteria never specify what counts as accurate output. The latter fact is directly traceable to the new epistemology of DL.

But most practitioners of DL don’t even realize that their epistemology was in fact a choice, or that this choice might be near the origin of their hallucination problem. As an academic discipline, the field is nowhere close to even acknowledging that it has a problem, much less that it might be confronting a genuine crisis.

I believe that not only there is a crisis in progress, but that it will culminate very soon. I’ll be building up that story over the next few weeks.

Next Week: The “AI” State of Play

All 7 parts, once published, can be found here: Artificial Intelligence