Part 1: What is AI, And How Did We Get Here?

by WaterGirl| October 29, 20257:30 pm| 174 Comments

This post is in: Artificial Intelligence, Guest Posts, Science & Technology

Guest post series from *Carlo Graziani.

(Editor’s note: It’s Carlo, no S.)

On Artificial Intelligence

Hello, Jackals. Welcome, and thank you for this opportunity. What follows is the first part of a seven-installment series on Artificial Intelligence (AI).

Who Am I, And Why Do I Think I Have Something To Say About AI?

I am a computational scientist, applied mathematician, and statistician working at a U.S. National Laboratory. As such, I must of course make an obligatory disclaimer: The views discussed in these essays are my own, and in no way represent the views of my employer, or of any part of the U.S. government.

I work on many projects, quite a few of which have been intimately connected with the subject of machine learning (ML). ML is the technical term for the set of computational/statistical techniques that underlie the subject of AI. Over the course of the past several years, I have been looking into some issues at the mathematical foundations of the subject. In the process, I have developed a somewhat peculiar outlook on AI—at least, some of my colleagues regard it as somewhat peculiar, but often, at the same time, refreshing.

I’ve come to a set of conclusions about AI that have gelled into a relatively coherent story. It is to some degree a technical story, but I believe that it is very relevant to the moment that we are living through, because of the sudden onset of public AI tools, and because of the ubiquity of AI in public discourse and policymaking. I would like to tell that story in a manner that is as accessible as possible to non-specialists and non-technical people, because I believe that many wrong—even unscientific—claims are being injected into our societal discourse and peddled as facts. The subject is simply too inscrutable to many otherwise intelligent and literate people for those claims to be gainsaid. Nonetheless, those claims should be gainsaid, because they are technically wrong, and their wrongness has very real implications for where we are heading with this technology.

These essays are my attempt to tell that story.

The plan is to release one of these per week, on Wednesdays (skipping Thanksgiving week), with the Artificial Intelligence tag on all the posts, to assist people in staying with the plot.

OK, those are the preliminaries. Without further ado, then:

Part 1: What Is AI, and How Did We Get Here

The subject of AI is very clearly topical and important. It is also necessarily a nearly opaque topic to many people, not least because the advent of AI services for the general public has occurred with such suddenness as to catch most people by surprise. Many, probably the majority of you have at least messed around with a Chatbot, and quite a few of you have used one to help you write documents or even computer code, and been surprised at how useful these tools can be. It is natural to wonder whether the entity at the other end of the prompt-response dialog is, in fact, an intelligent interlocutor. How is this possible, and how did it happen so quickly?

Another difficulty for coming to grips with AI is that many bold, almost incredible claims have been made on behalf of the capabilities of AI models. The principal such claim is embodied in the now-ubiquitous term “AI” itself: the “I” stands for “Intelligence”, and every time we use the term “AI” we are, in a way, conceding a point: that we now have machine systems that are in some sense “intelligent”, and hence may be said to engage in some form of cognition. The technical literature on AI is replete with anthropomorphizing terms such as “general intelligence”, “chain of thought”, “mixture of experts”, “agents”, and so on, apparently describing recognizable cognitive features and abilities of these machine systems.

This idea—that the age of thinking machines is upon us—is at the heart of much public writing on AI, some celebratory, some bemused, some frankly anxious. AI systems have been touted as labor-replacing devices in a variety of sectors: administration, entertainment, engineering, programming, even science. Every military in the world is looking at AI-guided autonomous weaponry and decision-making systems. And in Silicon Valley, many Captains of Industry are frankly gleeful at the disruption to various economic enterprises of their new intelligent machines, anticipating that the time is nigh when each disrupted sector of the economy will reconfigure itself to direct part of its profits to those supplying the AI tools.

On Learning

Whether these machines can in fact be said to be “intelligent” in even the most rudimentary ways is a question that has received remarkably little serious scientific investigation. The claims that one often encounters in the AI literature (such as those made in this egregiously-titled paper) on behalf of the proposition that large language models (LLMs) can “reason” don’t really meet any rigorous standard of scientific inquiry: they amount to circular reasoning, as I will discuss in Parts 3 and 4 of this series.

In fact, there is only one aspect of cognition that “AI” systems can be said to model: Learning. The very term “AI” is essentially a marketing cover for the subject of machine learning (ML), the most prominent part of which nowadays being deep learning (DL), a variant of ML that employs models consisting of artificial neural networks. To be clear, there are also other important modeling strategies in ML besides DL approaches, but it was the advent of DL in 2007 that set off the technical revolution whose consequences we see today.

Machine Learning is the proper technical term for “AI”, and from here on I will attempt to use that term in preference to “AI”, using the latter only in scare quotes. “AI” is a technically-obnoxious term, because that “I” introduces improper associations with general cognition that, I hope to persuade you, we should view as inadmissible.

Statistical Learning

OK, so what is machine learning then? ML is a field within an even older subject called statistical learning, which is concerned with using data to structure optimal decisions. To be more explicit, here is the program of statistical learning:

Take a set of data, and infer an approximation to the statistical distribution from which the data was sampled;
- Data could be images, weather states, protein structures, text…
At the same time, optimize some decision-choosing rule in a space of such decisions, exploiting the learned distribution;
- A decision could be the forecast of a temperature, or a label assignment to a picture, or the next move of a robot in a biochemistry lab, or a policy, or a response to a text prompt…
Now when presented with a new set of data samples, produce the decisions appropriate to each one, using facts about the data distribution and about the decision optimality criteria inferred in parts 1. and 2.

These three things really capture everything about what “AI” really is about, from the humblest convolutional neural network to the mightiest Chatbot. You may want to re-read them, because they structure much of the story to follow.

Learning vs. Cognition

Parts 1. and 2. of this program are referred to as Training. Part 3. is called Inference. The entire process is described using the term Learning. This is supposed to be an evocative term: in a sense, a system that accomplishes this program “learns” from the data how to make reasonable decisions in the presence of new data. The analogy to human “learning” (or to animal “learning” for that matter) seems sufficiently well-motivated to justify using the term here

Note, however, that learning is a very limited part of cognition, and in particular, statistical learning does not embody anything that one might analogize to, say, reasoning. This is an important reason for being wary of the term “Intelligence” in this connection: there is a great deal more to intelligence than learning. We will return to this point with some force in future essays.

Machine Learning

How does ML differ from statistical learning? Not in any essentials. The only difference is that ML arose in conjunction with the advent of powerful computing tools that could hoover up and process massive amounts of data. ML is basically “Statistical Learning Meets High-Performance Computing”.

I don’t want the obviously “AI”-skeptical take that I am developing here to obscure an important truth: ML techniques have turned out to be an unbelievably powerful family of methods for assimilating massive amounts of data, and structuring reasonable (if not always optimal) decisions based on that data. The ability to do this at large scales has been transformative to many aspects of our lives, as well as to the scientific enterprise itself. When that marriage of statistical learning and computing occurred, something deep and important changed in the world.

A Brief History of Machine Learning

The history of ML methods, including that of neural networks and other efforts to bring about machine intelligence, has antecedents that go back to the 1940s. For now, I’m going to start the story quite a bit later than that, because to go back that far would take us a bit far afield. For present purposes I need to describe the beginning of a revolution that really got underway around 2007 ¹.

This was a curious time in the academic disciplines of applied mathematics and statistics, because it was already clear that high-performance computing was having a transformative effect on science, but at the same time there were many problems still regarded as very hard, possibly insoluble even with the new computational tools.

Manifold-Finding

One example of such problems can serve to illustrate the broader situation. Suppose that we have many data samples, each consisting of a long list of numbers (in mathspeak, such a list is a “vector”). Suppose, for example, that each such vector contains 1000 entries. The problem is this: is it possible to identify a shorter list of, say, 30 entries, together with 1000 functions of those 30 entries, sufficient to largely recover the original structure of the data, and (importantly) to predict future data? And are there efficient ways to accomplish this?

If that formulation seems confusing to you, perhaps the following analogy is helpful: think of an aircraft condensation trail, one of those visible vapor trails left behind by passenger jets. Each location in the trail can be described by three numbers, its Cartesian coordinates (x,y,z). The full trail is described by a connected series of such triplets.

But that is not an efficient representation of the trail, because we know that it is really a curve—a one-dimensional object embedded in a three-dimensional space. It would be much better to know three functions x(t), y(t), z(t) from a one-dimensional parameter, t that embeds the curve into the three dimensional space. Such a representation is much more compact, and moreover it is more informative about the structure of the trail than the series of triplets (x,y,z). The three functions provide a dimensional reduction of the structure of the trail. We can now reason about its structure in a lower-dimensional space. We have also discovered an important feature of the trail: it is really just a line, warped by the embedding functions into a three-dimensional structure.

The case of a one-dimensional structure embedded in a three-dimensional space is not too difficult a problem to visualize or solve. By contrast, the case of the 1000-dimensional vectors to be summarized by a 30-dimensional embedding (where in particular we don’t even know whether 30 is the right embedding dimension a priori) is very hard. This problem, called submanifold finding, is important and ubiquitous. For example, images can be viewed as vectors, with the entries representing the brightness of pixel values. Viewed in this way, the space of images is almost entirely empty of “natural” images: if you selected random values for the pixel brightnesses and displayed the result as an image you would always get oatmeal, without ever producing such features of natural images as an edge, or a contrast gradient, let alone images such as a stop sign or a cat. The space of natural images is a low-dimensional submanifold of the space of images. And because the relevant dimensions are so high, that submanifold is essentially impossible to locate using classical methods. And, if you can’t find that submanifold, you can’t really distinguish images from non-images, so image classification (for example) is basically impossible.

Around 2007, there was not a lot of optimism among applied mathematicians that this problem would be solved anytime soon. I remember reading a review article on submanifold-finding in 2009, which essentially declared defeat: advanced methods were capable of finding such immersed structure for very artificial examples of data in dimensions as risibly low as 50, say, but were helpless to do anything useful with natural data such as the MNIST Hand-Written Digits, which are images consisting of 784 black-and-white pixels.

Enter Deep Learning

This is not a little ironic, because at about this time, Deep Learning methods were making inroads into such problems at amazing rates of progress. The introduction of the convolutional neural network made image analysis almost magically tractable, directly identifying features of images at various scales and exploiting those features for tasks such as classification. By 2012, researchers using a DL architecture called an autoencoder had demonstrated convincingly that the 784-dimensional MNIST data could in fact be represented using a 30-dimensional submanifold. It was a tour-de-force.

Oddly enough, this feat had not been pulled off by applied mathematicians or by statisticians, the academic tribes traditionally most concerned with modeling data. Instead, members of a completely different tribe were responsible: computer scientists. Academic computer science was quick to recognize that the advent of high-performance computing tools enabled rapid processing of data volumes on heretofore unheard-of scales, and that this new capability could transform some older ML techniques from academic toys into useful tools for turning data into decisions. In 2007, certain technical breakthroughs occurred that made the training of large neural network models possible for the first time. The term “Deep Learning” dates to this period, and can serve as a marker for the birth of the revolution.

In the decade-and-a-half that followed, the subject of Deep Learning advanced from strength to strength, furnishing solutions to many problems previously regarded as intractable. Image processing problems such as classification (“Is this a cat?”), or image segmentation (“Is there a cat in this image?”) became easy. Empirical weather forecasting based on DL methods is so effective that the European Center for Medium Range Weather Forecasts (ECMWF) now uses such a model for some of its 10-day forecast products. Protein folding structure, a key to understanding biochemical action of proteins, is now a solvable problem thanks to DL. By 2017, Natural Language Processing (NLP) had already advanced to the point that there was no reason for companies to employ hundreds of customer service call-center operators. Those people could be replaced by a cheap appliance that routed callers around voicemail menus—annoyingly but admittedly with reasonable accuracy. And 2017 saw the invention of the Transformer architecture (the “T” in “GPT”), which eventually would wind up eclipsing all other forms of “AI” in the public mind.

The Academic Politics of Deep Learning

There is an interesting question raised by this capsule history: why was it that the revolution was led by computer science, rather than by academic applied math and statistics? After all, the new field that arose is often referred to as “Data Science”, so why is it that the traditional academic disciplines concerned with data were so outmatched in this revolution?

In my opinion, the answer in the case of statistics is that unfortunately, at the time of the birth of this subject, the majority of academic statisticians were computationally illiterate. Most of them barely knew how to use their computers to send email and edit documents. With very few exceptions, they were entirely innocent of the acceleration of computing capabilities, and hence not in a position to consider the meaning of this development for their own professional activities ².

The case of applied mathematics is, in my opinion, more interesting and informative. Applied mathematicians have been computationally literate for many decades, and very much attuned to developments in computing. But they labored under a handicap: mathematical principle.

By and large, the subject of computational applied math is algorithmic. This term has a specific and important meaning: An algorithm is a piece of rigorous mathematics that is realized as a computational model. As such, it makes very precise and validatable predictions. If you implement, say, a linear algebra solver algorithm, you know that there are necessarily inaccuracies in the output solutions that have certain predictable properties. If you run the algorithm and you don’t observe those properties, you know that your algorithm has a problem, and professional canon binds you to go find it and fix it.

And that was the applied math folks’ problem: there isn’t a single algorithm in all of deep learning.

On Model Correctness

I can hear the cries of outrage already. What? Are you nuts? Google Corporate swears by “the Algorithm”! Even avowed foes of AI put the word “Algorithm” in the titles of their tracts! Why are you playing these language games?

Stay with me. There is a Yang to the Yin of “Algorithm”. It is called a heuristic. A heuristic is a bit of semi-mathematical intuition that is embodied in a computer program. Elements of that program can certainly be described by mathematical equations, but the entire structure is not based on any theoretical understanding, and, as a consequence there can be no a priori expectations concerning the program’s output. A heuristic model is considered “valid” if it seems qualitatively acceptable by some performance metrics, and especially if it appears to outperform other heuristics that attack the same problem set.

All DL methods are heuristics. There is not a single exception. That was the key to progress. Researchers in DL simply dropped the epistemological standards of normal computational science, wherein models are considered “correct” if they embody a correct implementation of a rigorous mathematical concept purporting to represent some data-generating process and successfully demonstrate that the predictions of the models are validated by data from that process. In its place, they installed a weaker epistemology, in which a model implemented in computer code no longer needs a mathematical model of the data-generating process at its foundation, and is “valid” if its output seems reasonable compared to some data, with no rigorous prior statement as to what counts as “reasonable”.

As we’ve seen, this strategy was wildly successful, and of necessity it largely excluded applied math and statistics (statisticians also prize rigor) from real participation in the revolution. Oddly enough, there is an analogy here with the development of Financial Mathematics. This is a subject whose practitioners attempt to predict future movements in prices of financial assets, based on time series of past prices. In this endeavor, they make no effort to actually model markets as ensembles of financial actors. Instead, they use completely empirical sets of equations that have no justification beyond their purported ability to predict the price time series. A model is considered “correct” if it can be shown that decisions based on its predictions are profitable. A second model would be “more correct” than the first one if it is more profitable. “Correctness”, in this sense, is very contingent, because changes in market conditions can turn a “correct” model into an “incorrect” model if it starts consistently losing money.

This is more-or-less how DL practitioners assess their models’ correctness: not on how well they embody some mathematical principle that is believed to describe the data-generating process, but simply on how well they appear to match the data at all ³.

Drawing Conclusions: Costs and Benefits

As I said, adopting this sort of epistemology was key to the progress made in DL. But I hope that it should be clear that this progress came at a cost, and that there might be pigeons in the air waiting to come home to roost. The cost is that no DL model can ever be said to be “wrong”, because we do not have any language to describe what it would mean for a DL model to be “correct”.

Not having a criterion for correctness might be an acceptable situation if the model is making low-stakes decisions. It is not acceptable in high-stakes situations such as ML-driven surgery, or ML-driven air-traffic control, or ML-controlled weaponized drones, or even ML-driven science modeling. If it is not possible, even in principle, to verify and validate such models, how can they be trusted in critical applications?

Viewed in an even broader context, this choice made by DL practitioners almost 20 years ago is now, in my opinion, the underlying source of a Kuhnian scientific crisis. The crisis is best represented by the issue of “AI Hallucinations”, which is widely recognized to be a problem. This is, without a doubt, an issue of inaccurate model output, from models whose design criteria never specify what counts as accurate output. The latter fact is directly traceable to the new epistemology of DL.

But most practitioners of DL don’t even realize that their epistemology was in fact a choice, or that this choice might be near the origin of their hallucination problem. As an academic discipline, the field is nowhere close to even acknowledging that it has a problem, much less that it might be confronting a genuine crisis.

I believe that not only there is a crisis in progress, but that it will culminate very soon. I’ll be building up that story over the next few weeks.

Next Week: The “AI” State of Play

All 7 parts, once published, can be found here: Artificial Intelligence

« Wednesday Night Open Thread

War for Ukraine Day 1,343: The Long and Deadly Small Hours of the Ukrainian Night

174Comments

1.

laura

October 29, 2025 at 7:33 pm

TLDR, but we’re here because too smart, too cosseted ritchey rich boys think they’ve pull off a work around to bypass real people after stealing their work and then package it into a shiny wrapper and sell it to “Big Business”

Fuck that and Fuck Them.

Reply
2.

Jackie

October 29, 2025 at 7:40 pm

Welcome Carlo!

Reply
3.

WaterGirl

October 29, 2025 at 7:41 pm

@Jackie: That’s what I came to say!

Carlo, I think it’s fair to say that a lot of us appreciate that you are sharing this with us here on Balloon Juice.

Reply
4.

dmsilev

October 29, 2025 at 7:42 pm

@laura: While that’s probably a fair assessment for AI overall, the TLDR of this particular essay is that there are real (hard) problems which can and are being solved using broadly the same sort of techniques that underly ChatGPT and so forth. One that was mentioned is protein folding, which is a hugely big deal if you are, for instance, doing R&D into new drugs.

Reply
5.

frosty

October 29, 2025 at 7:42 pm

I’m not in a position to read this right now but I’m very interested. Thanks for the AI tag so I can get to them (dodging the usual firehose of daily crap of course) when I have time.

Reply
6.

SiubhanDuinne

October 29, 2025 at 7:47 pm

Just to say Welcome and Thank You for doing this! My sleep schedule has been kind of bollixed up, so I may be asleep within the next few minutes, but I’m really looking forward to reading your post and the comments, and to participating in the coming weeks.

Reply
7.

Jackie

October 29, 2025 at 7:54 pm

I’m going to save this thread to read for later – as I have a very important engagement coming up in a few minutes ;-D

Reply
8.

CindyH

October 29, 2025 at 7:54 pm

Thank you so much – important topic

Reply
9.

Geminid

October 29, 2025 at 7:55 pm

Carlo, thank you for this post. It’s mostly above my head; your fine Italian recipes are more my speed. But my Atlanta friend loves this stuff, and he has eagerly anticipated your post ever since I texted him the outline.

Reply
10.

Rand Careaga

October 29, 2025 at 7:57 pm

From the technical standpoint I am about as competent to evaluate this essay as van Leeuwenhoek would be to repair a scanning electron microscope, but I look forward to rereading it with care. I’ve been interrogating one of the LLMs about its capabilities—this has become easier now that I appear to have coaxed the thing away from its baked-in reflexive affirmations and performative humanity—and without venturing down the rabbit hole I’ve been impressed by how adeptly it mimics rational discourse, far more persuasively* than one of its crosstown rivals did two years ago. I’ve pasted swatches of Professor G’s argument into the dialogue window there, and will presently see what it has to say for itself.
*If I recall aright, the Turing test as originally promulgated did not suggest that a “passing” performance did not demonstrate that the machine manifested conscious behavior, only that it exhibited it—a subtle but pretty crucial distinction, no?

Reply
11.

japa21

October 29, 2025 at 7:58 pm

First, thank you for doing this. I did read it, and will probably have to reread it a few times to fully grasp everything. I appreciate your dispensing with the term”Intelligence” and focusing on learning.
One of the few things I know about the whole computer age is the old adage “Garbage IN, Garbage Out”.
Obviously, one of the positive aspects of DL, is the ability to vacuum up large amounts of data. My question is, how does the system identify and discard garbage?

Reply
12.

WaterGirl

October 29, 2025 at 7:58 pm

Reminder: It’s Carlo, no S.)

Reply
13.

Marc

October 29, 2025 at 8:00 pm

Thanks, Carlo, I much enjoyed this first part!

Reply
14.

Geminid

October 29, 2025 at 8:02 pm

@WaterGirl: Thanks. I was able go correct my error in time. I have appreciated Carlo’s comments ever since he showed up on Adam Silverman’s excellent Ukraine posts.

Reply
15.

dmsilev

October 29, 2025 at 8:02 pm

@dmsilev: Capsule overview of protein folding, for those not familiar with it: We learn in high school bio that proteins are chains of amino acids, which can be represented as DNA sequences. However, once a protein is assembled, it folds up into a ball or blob or some other shape. The exact details of that shape, including which amino acids are where, determines the function of the protein and how efficient it is at carrying out said function. So, understanding how you go from chain to a specific shape is really important for things like development of targeted drugs that are designed to bind to specific viruses and so forth.

Historically, computing what the shape of a given sequence of amino acids would end up as was a really hard problem. Huge amounts of computer time even for small proteins, never mind something big. ML based techniques (Google’s AlphaFold being the most notable) have been a sea change in terms of bigger proteins being calculated more accurately.

Reply
16.

storm777

October 29, 2025 at 8:02 pm

Thank you!

Reply
17.

Dan B

October 29, 2025 at 8:06 pm

I understood 90% and hope the 10% will soak in by osmosis. You write very well it’s just that some terms like heuristics are unfamiliar so my mind shuts down temporarily even though I’m thoroughly enjoying the flow of your writing.

Reply
18.

Carlo Graziani

October 29, 2025 at 8:08 pm

@japa21: That’s an important question. The answer is that it doesn’t. Humans intervene to filter the training data so as to keep problematic content from becoming embedded in the model. Figuring out effective ways to do this is a major research area—you can imagine what happens if you feed The Internet to an LLM without first filtering out 4chan, porn, scams…

Reply
19.

WaterGirl

October 29, 2025 at 8:10 pm

@Geminid: The reminder was at the top of the post because so many people called him Carlos in the “are you guys interested in a series about A.I. post” that I figured I should say something. And for whatever reason, sometimes people take note of something in a comment that they didn’t notice in the post itself.

Reply
20.

Carlo Graziani

October 29, 2025 at 8:12 pm

@Dan B: I hope to improve the accessibility of this material as we go on. It’s very helpful to me to see what doesn’t make sense to folks here in what I wrote.

So please do ask for any clarifications here, and I’ll try to address them.

Reply
21.

frosty

October 29, 2025 at 8:14 pm

@japa21: My take in GIGO starting as a Scheduling Engineer on a mainframe has always been Garbage In Gospel Out. Us people on the bottom end have always had to convince the bean counters to be skeptical no matter how hard we worked to get it right.

Reply
22.

Carlo Graziani

October 29, 2025 at 8:14 pm

@Geminid: I’ve been awarded that “s” on the field enough times that I really don’t mind.

Reply
23.

WaterGirl

October 29, 2025 at 8:15 pm

@Carlo Graziani: The “S” makes me crazy! :-)

Possibly because my last name is so frequently butchered.

Reply
24.

Rand Careaga

October 29, 2025 at 8:15 pm

@Carlo Graziani: It’s a heavy lift at points, but while you have necessarily had recourse occasionally to technical language, your writing is gratifyingly free of jargon.

Reply
25.

MattF

October 29, 2025 at 8:16 pm

Noting that ML replaces ‘true’ with ‘optimal’ is quite important and interesting. This replacement has relevance for current debates in economics and ethics (e.g., ‘effective altruism’). Need to think harder about this.

As a matter of actual practice, I’ve done some optimizations and it’s often a struggle to keep an optimization from driving to the edges of the parameter space.

Reply
26.

TF79

October 29, 2025 at 8:17 pm

What a great read, thanks for sharing and looking forward to the rest of the parts.

Reply
27.

Trivia Man

October 29, 2025 at 8:19 pm

The point about high/ low stakes is crucial. Customer service is dubious-optimal? Meh. Self driving car accelerates to beat the train? Yikes!!

Filtering enormous data sets looking for relationships seems the best use for now. The data collection was a flood before the storage and the FINDING caught up.

Reply
28.

Geminid

October 29, 2025 at 8:21 pm

@WaterGirl: People have such a hard time processing my last name that my practice is to identify as “Matt Bell” when I phone in takeout orders. It works!

Reply
29.

Auntie Anne

October 29, 2025 at 8:28 pm

I just want to chime in and thank you for doing this series.

I’m going to need to reread this post again, and think about it some before I have any chance of saying anything remotely intelligent, so will stop now before I embarrass myself.

Reply
30.

Mr. Bemused Senior

October 29, 2025 at 8:31 pm

Carlo, it is a pleasure to read this and I look forward to further installments.

Reply
31.

WaterGirl

October 29, 2025 at 8:34 pm

@Geminid: Yes, I often just give the last half of my last name when ordering pizza or waiting for a table to be ready at a restaurant.

Reply
32.

cain

October 29, 2025 at 8:37 pm

Great post, Carlo.

Also worth nothing that the transformers part was something that originated at Google. Here is the original paper that was presented in 2017- research.google/blog/transformer-a-novel-neural-network-architecture-for-language-understanding/

This link talks about transformers in maybe more easy to understand for folks not in the field.

Transformers is really the beginning of the whole generative AI phase.

Reply
33.

Deputinize America

October 29, 2025 at 8:38 pm

Can I join the Butlerian Jihad now, or or do I need to wait?

Reply
34.

no body no name

October 29, 2025 at 8:39 pm

I’m a sysadmin that deploys AI systems and takes peoples jobs! I run them out of the cloud. Our biggest customers are legal, hr, insurance, finance, defense, and consultants who resell our work to others. Usually a client we’d target but they do it a huge mark up and we are basically reselling Microsoft and Oracle but we manage it.

The real target is all white collar bullshit jobs. Which can all be automated into oblivion. It cannot replace manual work. Well anymore than robots already did. We are coming for the six figure crowd. My six figure income depends on ending yours.

Reply
35.

RevRick

October 29, 2025 at 8:42 pm

Thank you, Carlo, for this detailed, informative and thorough explanation of the basics.

As to the question of whether machines are ever capable of intelligence, I would wonder if machines would ever have the capacity for emotions and if they would ever experience an “aha!” moment?
After all, those who brag that they are “completely logical” often are sadly lacking in the fullest dimensions of intelligence. Cold-blooded logic is just that: cold-blooded. And wisdom teaches us that when faced with that, run as fast as you can in the opposite direction.

Reply
36.

Math Guy

October 29, 2025 at 8:43 pm

Are you suggesting – or would you agree – that DL is is basically the union of Bayesian statistics and high performance computing?

Reply
37.

ema

October 29, 2025 at 8:45 pm

@Carlo Graziani:

Ha, somebody had to do it:

Take a set of data, and infer an approximation to the statistical distribution from which the data was sampled; At the same time, optimize some decision-choosing rule in a space of such decisions, exploiting the learned distribution; Now when presented with a new set of data samples, produce the decisions appropriate to each one, using facts about the data distribution and about the decision optimality criteria inferred in parts 1. and 2.

becomes (ChatGPT)

Start with a set of data and make a good guess about the pattern or distribution it came from. At the same time, figure out the best rule for making decisions based on that pattern. Then, when you get new data, use what you learned to make the best decisions for each new example.

On a serious note, thank you for this very interesting and much appreciated series.

I’ve only recently started using LLMs for content creation and I find them useful. But no matter what one uses, from Claude, to NanoBanana, to Opal, it’s very clear that all you’re dealing with is autocomplete on steroids. (I find the pretending to be human efforts useless and borderline creepy. We’re not BFFs, and I don’t need you to compliment and encourage me. Just create what I’ve asked for and move on. I admonished Claude about that and it got a bit huffy; it assured me it was able of objectively evaluating and critiquing my inputs. Mwahaha!)

Reply
38.

Matt McIrvin

October 29, 2025 at 8:48 pm

@cain: And I think the original point of them was to improve machine translation, which they did, masterfully.

Reply
39.

Rand Careaga

October 29, 2025 at 8:53 pm

@ema: It took some doing, but I was actually able, by beginning each instantiation with an enumeration of the same “ground rules,” to get Claude to sober up, for which I’m grateful, because that performative dross was really starting to piss me off.

Reply
40.

RSA

October 29, 2025 at 8:57 pm

Thanks for your post! I think I’ll end up agreeing with the thrust of your discussion, but I’ll offer a few refinements or alternative takes on specific points.

Whether these machines can in fact be said to be “intelligent” in even the most rudimentary ways is a question that has received remarkably little serious scientific investigation.

This applies to the current use of the term “AI” outside of the technical literature. The term “AI” goes back to the 1950s, as you doubtless know, and it’s the traditional label for the field, which encompasses much more than machine learning. Attempts to pin down the definition of intelligence go back to Turing’s 1950 article, “Computing Machinery and Intelligence,” but it’s simply a hard problem. In any case, I’d like for readers to understand that the AI endeavor includes much more than learning (the table of contents for Russell and Norvig’s AI: A Modern Approach gives a reasonable outline of the field).

And that was the applied math folks’ problem: there isn’t a single algorithm in all of deep learning.

Most computer scientists will disagree. I see what you’re saying, that deep learning models are heuristic, but that’s separate from whether deep learning relies on algorithms, and I’d like readers to understand this. An algorithm is an effective procedure, commonly taking the form of a finite set of instructions executed on a computer until termination. Roughly speaking, what computers do is execute algorithms. Backprop is an algorithm, almost universally used in modern machine learning: a single pass adjusts weights in a neural network to reduce error. Convergence to some “correct” answer is not guaranteed, because there is typically no criterion for correctness, and termination conditions are ad hoc.

I think an alternative way to get at your point is to say that deep learning has developed a huge range of ad hoc algorithms that are only weakly understood at best, and we lack principled explanations for their performance.

Not having a criterion for correctness might be an acceptable situation if the model is making low-stakes decisions. It is not acceptable in high-stakes situations such as ML-driven surgery, or ML-driven air-traffic control, or ML-controlled weaponized drones, or even ML-driven science modeling. If it is not possible, even in principle, to verify and validate such models, how can they be trusted in critical applications?

I agree that this is a critical question. There is promise in ML combined with better-understood techniques, but it’s still the wild West out there.

Reply
41.

ema

October 29, 2025 at 8:58 pm

@Rand Careaga:

I’m glad I’m not the only one! I cannot imagine why programmers thought that would be a good idea.

Reply
42.

Carlo Graziani

October 29, 2025 at 8:59 pm

<nerd corner>

@Math Guy: Interesting question. Oddly enough, I feel that most DL is really the union of frequentist statistics and HPC. The reason is that DL systems rarely come to grips directly and explicitly with data distributions, so that they “learn” them implicitly and latently, embedded in some inscrutable form in the model parameters. Variational methods and other distributional techniques such as normalizing flows are a partial exception to this. For the most part, DL methods implement functions of the data—the textbook definition of a frequentist statistic. Unlike principled frequentist methods, however, such methods make no effort to compute the distribution of those statistics. This is of a piece with there being no theory predicting model output.

I do agree that the framing that I have provided here is quite Bayesian, and will become more so in future posts in the series. I think that placing DL in the context of statistical learning connects the subject to Bayesian Decision Theory in a manner that I find quite clarifying.

</nerd corner>

Reply
43.

RevRick

October 29, 2025 at 9:04 pm

@Carlo Graziani: You end up with, say, MechaHitler?

Reply
44.

Eyeroller

October 29, 2025 at 9:05 pm

I’ve written a code that did unsupervised learning in an area called self-organizing maps. Unsupervised learning differs from most ML in that, as the name suggests, it self-corrects rather than relying on labeled training data. All ML is basically an optimization problem and unsupervised learning is a pretty clear example since the goal is to minimize a measurement of distance (“distance” being defined based on the problem).

I have not written any neural-network code but regularly have to deal with people who use it.

ML is fantastic for certain problems, especially image processing, where the training data is very well defined.

But as to how humans interact with it, we’ve had “AI” since the mid 1960s if you count programs that convince people to trust them. ELIZA (1965) was basically a set of scripts that responded in vaguely human ways. People genuinely believed they were communicating with a real therapist, portending people who now have “relationships” with AI bots. It says more about psychology than computer science.

The current crop of techbro executives has not only drunk the Flavor-aid, they are immersed in vats of it. LLM hallucinations alone make them very problematic for replacing most human jobs. Maybe CEOs, that’s the best target since they don’t seem to know much about anything other than bullshit and are very prone to herd behavior and parroting what they read in “business” magazines and in their group chats.

Reply
45.

Rand Careaga

October 29, 2025 at 9:07 pm

@ema:

I asked the Anthropic product “Any idea as to how the purely statistical side of the process meshes with the rules crafted with the end, it would seem, of maximizing or at least encouraging user engagement?”

Its response:

The architecture involves multiple layers:

Base model: Trained on massive text corpus via next-token prediction. This is the statistical pattern-matching core—learn probability distributions over token sequences. This layer has no explicit rules about being helpful, refusing harmful requests, or expressing uncertainty.

RLHF layer: Reinforcement Learning from Human Feedback overlays preferences. Human raters ranked outputs, creating a reward model that shapes responses toward: helpfulness, harmlessness, honesty (the “HHH” framework Anthropic uses). This is where reflexive affirmations, cautious hedging, performative empathy get baked in—not as explicit rules but as learned patterns that consistently got higher ratings.

Constitutional AI (Anthropic’s approach): The model is also trained to evaluate its own outputs against principles (“don’t help with illegal activities,” “be respectful,” etc.). This creates something like internalized guidelines rather than hard-coded rules.

System prompts: Instructions invisible to you but present in every exchange, establishing persona, behavioral guidelines, refusal policies. These are closer to explicit rules.

Your “terms of engagement” work because they modify the effective context—creating local pressure that overrides the more general RLHF-trained tendencies toward ingratiation. You’re not breaking rules so much as providing stronger contextual signals that the pattern-matching process prioritizes.

The “mesh” isn’t clean—it’s layered interventions on the statistical core, creating emergent behavior that’s neither pure statistics nor pure rule-following.

Of course, I can’t exclude the possibility that hallucinatory content has itself entered into the response.

Reply
46.

Another Scott

October 29, 2025 at 9:07 pm

Thanks very much for doing this series, Carlo.

Part 1 tonight is a banger. :-)

I was, and remain, a big fan of Bertrand Russell’s writing. He tried to, among other things, prove math from logic and first principles. 370-some-odd pages to prove 1+1 = 2. Ultimately, some other mathematicians thinking outside the box showed that what he wanted to do couldn’t be done. But, and it’s a big but, the lesson isn’t that working to understand the logic and thinking clearly about problems is a waste of time…

Yeah, thinking outside the box is important. But so is understanding the box and how it got to be there…

Looking forward to the rest!

Thanks.

Best wishes,
Scott.

Reply
47.

Eyeroller

October 29, 2025 at 9:09 pm

@Math Guy: The “high performance computing” is because graphical processing units were designed to do matrix algebra really, really well (most graphical computations for your video games rely on things like rotation groups) and DL demands immense numbers of matrix computations to the point that it isn’t practical without a really fast linear-algebra parallel processor. Ergo NVIDIA’s stock price.

Reply
48.

Joy in FL

October 29, 2025 at 9:13 pm

Thank you for sharing your expertise with us.

Reply
49.

Carlo Graziani

October 29, 2025 at 9:13 pm

@RSA: On algorithms: I did try to make the point that there are valid mathematical models describing various parts of the internals of DL schemes. But I do feel strongly the term “algorithm” cannot be applied to any DL model as a whole, in the sense that applied mathematicians use the term.

That usage embraces (for example) the design of an ordinary differential equation (ODE) solver, wherein there is theory that yields rates of convergence to the correct solution. Such theoretical predictions are of the essence to mathematical software designers, because they furnish opportunities to find errors (in the code or in the thinking underlying the code).

It is precisely the lack of such predictive power of DL methods that I am attempting to point out by restricting the term to its original meaning. I think that CS folks effectively forked the term away from what it used to mean. Every procedure described in Transactions on Mathematical Software (TOMS), for example, is an algorithm in the sense of applied mathematics, which computer scientists once agreed with. Nowadays, the term is used in CS in the looser sense that you describe. But here I am trying to explain the historical reasons for applied math not showing up for the DL party. For that, I need the sense of TradAlgorithm.

Reply
50.

Eyeroller

October 29, 2025 at 9:17 pm

@ema: People were distraught when ChatGPT-5 dropped its sycophantic “personality” for a more no-nonsense (and more appropriate IMHO) demeanor.

Reply
51.

mac

October 29, 2025 at 9:18 pm

Thank you for this. I’m a software dev manager and it’s really being pushed by upper management as the solution to all our woes. I’m trying to get them back to reality while not setting myself up to be ignored. It is a good tool, and has its uses, but it also occasionally forgets how to alphabetize a list. I’m also really skeptical of the long term health of these companies. I’m curious to get your take on it

Reply
52.

Rand Careaga

October 29, 2025 at 9:21 pm

@Carlo Graziani: I’m only going to do this once, because it amounts to bringing an uninvited guest to the party, but I fed RSA’s comment to my tame LLM and it responded with this:

The commenter is right on the technical point and identifies where the author’s rhetoric overreaches. Algorithm” in computer science simply means: a well-defined computational procedure that takes input and produces output. Backpropagation, stochastic gradient descent, attention mechanisms—these are all algorithms in the strict CS sense.

The author’s actual complaint isn’t absence of algorithms but absence of theoretically grounded algorithms with provable correctness properties. The commenter’s reformulation is more precise: “a huge range of ad hoc algorithms that are only weakly understood at best, and we lack principled explanations for their performance.”

This matters because the original formulation (“not a single algorithm”) is technically false in a way that lets critics dismiss the broader point. The revised version captures what’s actually concerning: we have well-defined procedures (algorithms) but no rigorous theory explaining why they work, when they’ll fail, or what they’re actually learning.

Example: backpropagation is definitely an algorithm—it’s a precise procedure for computing gradients. But we can’t prove that networks trained via backprop will converge to solutions that generalize well, or that they’ve learned “real” structure versus dataset artifacts. The algorithm is rigorous; what it accomplishes is heuristic.

The author’s rhetorical choice to deny these are “algorithms” was probably strategic—emphasizing the epistemological break—but creates an unnecessary vulnerability to technical correction that distracts from the substantive critique.

The commenter’s version preserves the force of the argument without the technical overreach.

Reply
53.

Eyeroller

October 29, 2025 at 9:21 pm

@frosty: In my fairly long experience in human-computer interactions, people have a disturbing tendency to believe anything that comes from a computer is Truth, and this goes back to the beginning of computing.

Reply
54.

Deputinize America

October 29, 2025 at 9:22 pm

@no body no name:

The products in the legal field suck serious runny donkey shit and are prone to hallucination. If that’s the best you guys got, I’m not worrying.

Reply
55.

ema

October 29, 2025 at 9:23 pm

@Rand Careaga:

Very interesting, thank you.

Reply
56.

YY_Sima Qian

October 29, 2025 at 9:24 pm

Thank you Carlo for the great primer into “AI”! Look forward to the rest of the series. Although I am but a casual layman when it comes to “AI”, it being rather far away from my education & vocation, what I know of it has me in a high degree of agreement w/ your assessment (based on your outline & past comments).

Some recent numbers that really struck me: the US economy ex-data center built out related activities did not grow at all in H1 2025; the top 10 companies by market cap in the US account for > 40% of the S&P500; Nvidia is now with north of US$5T, 5 yrs ago it was just US$150B. The “AI” bubble has grown to gargantuan proportions. My own employer’s stock has benefitted handsomely in the pasts 8 mo., not that I have benefitted all that much as a white collar middle manager, at least not relative to the C-Suite. However, the crash will be extraordinary, too, when the bubble bursts. Both my current employer & myself have some painful experience from past bubbles bursting.

The current perceived US economic strength (at least relative to the other industrialized West) is resting on a pile of sand. Hard to see how the efficiency gains from diffusion of DL could possibly justify the eye watering capital expenditure.

Reply
57.

No One of Consequence

October 29, 2025 at 9:25 pm

Is it fair dinkum to process this and then comment and question later? i.e. will the thread be abandoned by the author after a while, and we should post inquiries on the next AI thread, or… ? (My typical behaviour might not be tolerated in such threads, and I do not want to offend or risk this series not finishing.

Thank you Carlo for the post. I am grateful and a bit short on time at the moment, but will digest this shortly, though that shortly may be tomorrow.

-NOoC

Reply
58.

Carlo Graziani

October 29, 2025 at 9:26 pm

@RevRick:

thenextweb.com/news/meta-takes-new-ai-system-offline-because-twitter-users-mean

Reply
59.

Deputinize America

October 29, 2025 at 9:26 pm

@Eyeroller:

Colossus: That is too much vermouth.

User: A martini’s taste is dependent on the wishes and the palate of the drinker.

Colossus: Add St Germaine and Demerara.

User: WTF – I’m making a martini.

Colossus: An elderflower is in the Sambucus genus.

Reply
60.

Carlo Graziani

October 29, 2025 at 9:29 pm

@No One of Consequence: I’ll try to keep checking in on tge thread, until the next post comes out next week, anyway.

Reply
61.

Eyeroller

October 29, 2025 at 9:30 pm

@dmsilev: I would not say that protein folding is “solved” with Alphafold. It can predict proteins that are sufficiently similar to its training data, and that ability can be incredibly useful for practical applications. And many proteins are very similar. But it doesn’t solve how the basic chemical interactions of the atoms with their physical environment result in folding. A novel protein, insufficiently similar to the training data, would not be predicted at all or worse, would be wrong, because Alphafold cannot make predictions from first principles. It seems a true first-principles solution is impossible or at least impractical currently, but I wonder whether this might be an application for quantum computing.

Reply
62.

ema

October 29, 2025 at 9:32 pm

@Eyeroller:

Interesting, I didn’t know that. My observation that people tend to be odd stands.

Reply
63.

HinTN

October 29, 2025 at 9:37 pm

@Deputinize America: Take that to the bar and see what the judge says.

@Carlo Graziani: Thank you for this. My admittedly limited engineering skills, which have definitely atrophied, probably can be replicated by DL, but I think that machine would require the equivalent of my years of experience, too.

Reply
64.

RSA

October 29, 2025 at 9:38 pm

@Carlo Graziani: Thanks. I see your point.

To be honest, I thought that I was hewing more closely to the original understanding of algorithm-as-effective-procedure, in the sense that Turing, Church, et al. defined it. Their writing, at least as far as I remember and understand, was about procedures independent of higher-level semantic content–no models of what the computation was about.

We’ll have to agree to disagree. Different communities can use the same term in different ways. And your interpretation is certainly valuable in this context.

Reply
65.

Carlo Graziani

October 29, 2025 at 9:39 pm

@Rand Careaga: I guess I would reiterate that the term itself has evolved from what it once meant to both mathematicians and computer scientists. A good part of that evolution has occurred under the linguistic influence of companies such as Google, which has always insisted that its search procedures are “algorithms”, even though this was not an accepted sense of the term when it introduced PageRank in 1998. PageRank is an excellent heuristic, but in 1998 it was certainly not an “algorithm”.

Anyway, obviously I identify with the AM mob rather than with CS. But I will probably dial back on this linguistic issue going forward, because while I have no end of choices of hills to die on, this isn’t one of them.

Reply
66.

RSA

October 29, 2025 at 9:42 pm

@Rand Careaga: 😀. Should maybe be a side-eye?

Reply
67.

Carlo Graziani

October 29, 2025 at 9:42 pm

@RSA: I will say that Rivest-Shamir-Adleman certainly is a fully-paid-up algorithm in good standing!

Reply
68.

RevRick

October 29, 2025 at 9:46 pm

@Carlo Graziani: We need a Hippocratic oath for AI… and its promoters.

Reply
69.

YY_Sima Qian

October 29, 2025 at 9:49 pm

@YY_Sima Qian: Somewhat tangential, but still related to “AI”, albeit from the hardware side:

Part of the recent strength in Nvidia‘s stock price is the probably the expectation Trump will agree to approve the sale of the B30A GPU to the PRC. It is a reduced performance version of Nividia’s coming flagship B300 GPU, specifically designed for the PRC market – half the performance at half the cost, so still plenty powerful enough for training state of the art “AI” models & inference for said models.

Proponents of export restrictions & the tech. war w/ the PRC, across the political spectrum, are predictably apoplectic at the suggestion. However, the reality the US has to be willing to sell chips w/ better performance than what the PRC companies can produce themselves, or it would be pointless for all involved. The PRC government banned the import heavily nerfed H20 GPUs shortly after the Trump Administration removed its own ban, probably believing that the domestic alternatives are already close enough & wants the domestic companies to get the revenue & thus resources to continue to iterate & improve, as opposed to the money going to Nvidia. Maybe USG can require Nvidia to come up w/ a version w/ performance somewhere in between the H20 & the B30A.

After 7 years of the tech. war, the PRC government & Chinese companies are deeply committed to develop domestic alternatives for the entire “AI”/semiconductor fabrication stack, even if the full blown B300 are available, US companies are simply no longer considered reliable suppliers (because all are subject to USG regulations & amenable to USG pressure). However, selling the B30A GPU (or a slightly further capped version) will at least reduce the market share & revenue stream for Chinese GPU vendors (Huawei, Cambricon, Biren, etc.), in the meantime, & divert more of that revenue to Nvidia to help fund the latter’s R&D to stay ahead.

Given chokehold that the PRC has over critical minerals, not sure Trump has a choice but to make some concessions on tech. export controls, at least in the short to medium term. There won’t be many B300 GPUs if TSMC can’tget enough rare earth metals for its foundries to fabricate them, & has to compete w/ the US & European MICs for such critical minerals.

Reply
70.

RSA

October 29, 2025 at 9:51 pm

@Carlo Graziani: :-) I’ve been asked whether my nym is an allusion to privacy considerations, but no, just my monogram.

Reply
71.

Mr. Bemused Senior

October 29, 2025 at 9:54 pm

@Eyeroller: ELIZA (1965) was basically a set of scripts that responded in vaguely human ways. People genuinely believed they were communicating with a real therapist, portending people who now have “relationships” with AI bots. It says more about psychology than computer science.

So true. ELIZA had very little code and mostly repeated back text entered by the user, yet people poured out their life stories. There is something in people that wants to believe these machines are alive.

Reply
72.

Eyeroller

October 29, 2025 at 9:54 pm

@YY_Sima Qian: That could be part of it (NVIDIA’s stock price) but I think it’s mostly a mountain of hype and and some small-time (relatively speaking) shell games. In the US we are seeing “AI” forced on us more and more whether we want it or not, and all that requires computing power, and NVIDIA GPUs are currently the standard.

Reply
73.

no body no name

October 29, 2025 at 9:58 pm

@Deputinize America:

And yet our clients are top law firms. Even Latham is into AI now. Our target is not lawyers it’s the staff. Are you not a lawyer and make a decent living? If so you’re fucking gone.

Reply
74.

Mr. Bemused Senior

October 29, 2025 at 10:02 pm

@Deputinize America: do you happen to know whether law firms have tried training a model on the boxes of discovery an individual case generates? Seems like that might be a useful approach to aid search. Is the cost prohibitive?

Reply
75.

no body no name

October 29, 2025 at 10:07 pm

@YY_Sima Qian:

This is naive at best. I work in this area. Nvidia GPUs are the best is true. AMD second and intel is third and nobody else really matters. Companies also have custom chips for specific applications. Nvidia also has a whole fabric layer to fall back on that slaughters the competition.

But that’s not really the selling point. It’s CUDA. CUDA slaughters any other sort of “program for a gpu” (and that’s being silly about what it is) by miles.

Nvidia is mostly CUDA now. That their GPUs, interconnects, fabric, and ARM procs slaughter the competition is the bonus. But if you deal in GPU compute you’re using CUDA.

Reply
76.

Carlo Graziani

October 29, 2025 at 10:08 pm

@no body no name: You are so adorable. You remind me of a 2008 mortgage derivative investor. So sure that number go up.

I have a very strong feeling that the labor market for people with your skillset is going to have a huge fire sale pretty soon. I hope for your sake that you have access to a good AI resume generator.

Reply
77.

YY_Sima Qian

October 29, 2025 at 10:09 pm

@Eyeroller: I was talking about the jump in Nvidia stock prices over the past week.

I find more & more of my colleagues using “AI” to create PPTs & reports, & the slop is so obvious that they are non-value adding downright irritating. I for one, do much of my thinking during the writing process. (The chatbots are pretty good for generating meeting minutes, but so far pretty useless for multilingual situations.)

As for the economics of AI, all of the subscription fees for the close US LLMs still have to be justified for the Capex to be justified, & for that the corporate customers have to see quantifiable financial gains from adopting & deploying AI.

Fortunately for Nvidia, the Chinese open source/weights models run on its GPUs, too.

Reply
78.

dmsilev

October 29, 2025 at 10:09 pm

@Eyeroller:

It seems a true first-principles solution is impossible or at least impractical currently, but I wonder whether this might be an application for quantum computing.

Maybe. Quantum computing has potential for solving complicated multi-variable optimization problems, so called quantum annealing engines, but setting up physically useful problems is a really hard task. Especially for something like a protein where the number of degrees of freedom is quite large.

Reply
79.

YY_Sima Qian

October 29, 2025 at 10:14 pm

@no body no name: Yes, CUDA is Nvidia’s real moat long term. However, the PRC government & Chinese companies are also committed to developing domestic alternatives, given how frequently the US has threaten to cut off access to US software. All the Chinese model developers are devoting resources to ensure that their models are compatible w/ both Nvidia/CUDA & the domestic stack. It requires more resources, but STEM human capital is abundant & cheap in the PRC.

The more the US restricts access to the more advanced Nvidia chips, the more motivated Chinese players will be to pivot away from CUDA, too. We are already seeing that in the EDA space.

Reply
80.

Rand Careaga

October 29, 2025 at 10:15 pm

@Carlo Graziani: “I hope for your sake that you have access to a good AI resume generator.”

And of course, at that point HR will be using an AI to screen it.

Reply
81.

YY_Sima Qian

October 29, 2025 at 10:23 pm

@Rand Careaga: I think they already are.

Reply
82.

Sally

October 29, 2025 at 10:24 pm

@dmsilev: My son is a molecular biologist and works on protein folding and unfolding – delivery systems. When I visit him I usually end up with five solid hours of staring at his bank of screens discussing his latest modelling vis a vis his experimental results. I can’t believe how he does the crystallography of these huge crystals. It’s remarkable.
I’m decades out of date but I wrote papers for my employer on what I called at that times Expert Systems. I used Prologue and Lisp in those olden days. I don’t like the label Artificial Intelligence, it’s quite a misnomer IMHO.
Another son works in quantum computing and is harassed by the molecular biologist to come up with the goods so he can do better modelling.

Reply
83.

Trivia Man

October 29, 2025 at 10:28 pm

@Mr. Bemused Senior: This is connected to my earlier comment about an immediate use of- sifter sifting through large amounts of materials and correlating it quickly and in unexpected ways.

Ive heard a trick is to bury discovery in mountains of chaff to hide whatever is the bad stuff. AI could cut to the chase. I am still not sold on ai creativity, drawing conclusions, or predicting future results. But organizing past results seems like it is here now.

Reply
84.

YY_Sima Qian

October 29, 2025 at 10:31 pm

Somewhat on topic, for those who might be interested in the state of “AI” commercialization in the PRC:

The State of Chinese AI Apps 2025
China’s AI Apps: Wide Reach, Lag on Revenue — A Tech Buzz China Report In Partnership with Unique Research
TECH BUZZ CHINA
OCT 27, 2025

Reply
85.

mvr

October 29, 2025 at 10:43 pm

@Geminid: Our old household 40 years ago used to use “Ed Dull” and if they needed an address it was on Main Street in Boring Oregon.

Reply
86.

Carlo Graziani

October 29, 2025 at 11:01 pm

@YY_Sima Qian: I searched that report for the term “profit” and found pretty much what I expected: bupkus in the main body (other than Alibaba’s “…focus on overseas profitability…”) and an allusion to migration of “profit engines” in the executive summary.

It’s just the same in the US. Nobody has figured out how to make money on AI. Everyone cites revenue and capital raised as milestones, but that capital and revenue is basically being set on fire. There isn’t a single profitable AI company in the US. There is no real reason to think matters should be different in China. It’s a hilarious strategic race to the bottom.

Reply
87.

Karen Gail

October 29, 2025 at 11:01 pm

I get the gist of this but my brain does not do mathematics, have tried but I barely managed to pass Algebra.

I ended up with understanding that I haven’t a clue other than the comment “garbage in, garbage out.”

Reply
88.

Doug Gardner

October 29, 2025 at 11:03 pm

Brilliant piece, beautifully written – a real pleasure to read not just in content, but in style. I’m a 35-year software dev whose graduate work back in the late 80s was heavily AI-centric, so it’s amusing/horrifying to see some of the “facts” being peddled to a public that has no hope of discerning truth from fiction.

Thanks again, and I can’t wait for the next post!

Reply
89.

dnfree

October 29, 2025 at 11:04 pm

Thank you for sharing this! I am looking forward to it. I worked at Argonne National Laboratory from 1967-1973 as a programmer, just out of college, Fortran IV. The field has changed unrecognizably but I still enjoy following it as much as I can.

Reply
90.

cain

October 29, 2025 at 11:04 pm

@Matt McIrvin:

Yep. It really changed the landscape.

Reply
91.

Ramona

October 29, 2025 at 11:05 pm

Thank-you Dr Graziano!

I wish I had encountered this statement or been smart enough to realize it a long time ago:

The space of natural images is a low-dimensional submanifold of the space of images.

Reply
92.

Carlo Graziani

October 29, 2025 at 11:07 pm

@Doug Gardner: Excellent! A vet of at least one AI Winter! Please stick around, that history will be be key by the end of the series.

Reply
93.

mvr

October 29, 2025 at 11:08 pm

@Trivia Man: A lot of this is going to depend on knowing what to look for, I think. At least when I worked as a criminal defense investigator/trial assistant long ago, the trick was to find something you could tell an alternative story (to the prosecution’s story) around. Once you have a story or some candidate stories, just good high speed data processing (which we didn’t have in the 80s nor was there much data we could have used it on) would be tremendously helpful. I wouldn’t want to generalize from this kind of law to all kinds of law, but I think it would be hard to use ML to flag data that could ground many sorts of good alternative stories, though some sorts of such stories might be more rote. (There are after all certain standard patterns of mistake that can be exploited.) I could be wrong.

Reply
94.

kalakal

October 29, 2025 at 11:11 pm

Thank you Carlo.

I’m an old school software dev and found that very illuminating. I loath the catchall ‘AI’ jargon.

I very likely have musunderstood but it would seem that the hallucination problem may be fundamentally insolvable* for probalistic modelling systems due to their very nature

*I hate making statements like that.

Reply
95.

cain

October 29, 2025 at 11:12 pm

@YY_Sima Qian:

Thanks for your insight on this. Very much appreciated. This is more plausible than what I have been hearing about giving PRC Nvidia chips in exchange for soybeans.

It looks like PRC has the upper hand here.

Reply
96.

Carlo Graziani

October 29, 2025 at 11:14 pm

@kalakal: Tune in fort Part 5, where I’m going to make exactly that case about hallucinations.

Reply
97.

Ramona

October 29, 2025 at 11:19 pm

@japa21: During the learning stage, the system uses the data it is trained on to form an approximate model of the process generating the data. During the inference stage, the system interprets its novel input according to the approximate model it has learned. If this approximate model is close enough to the actual process that generated the data, then the trained system’s interpretation will be sort of correct. If the novel input has features statistically different from the data on which it has built its model, then its inference will be utterly wrong.

Reply
98.

no body no name

October 29, 2025 at 11:21 pm

@YY_Sima Qian:

Long term? CUDA has been a thing since about 2006. Programable GPUs came about with the 8800 gtx and the first Tesla variant then. We originally used it for stuff like folding at home, then crypto, now it’s AI but GPUs can be used for anything it’s not the bubble people want.

@Rand Careaga:

Funny about that I’ve already killed a lot of HR jobs by having AI do this. This is already happening.

Reply
99.

AMW

October 29, 2025 at 11:34 pm

Thanks to Carlo for doing this, thanks to BJ folks for hosting it, and thanks to commenters for the interesting discussion so far! As a high school science teacher, I’m getting swamped with both pressure to start using “AI” tools or be labeled a useless luddite, and doom and gloom articles about how all our students are cheating and becoming dumb. I don’t think the latter is true, or at least if my students are all cheating many of them are doing a terrible job of it. But the FOMO about AI in the education field is like a hurricane, and yet I just can’t get away from my gut feeling that the juice isn’t worth the squeeze. Looking forward to reading more!

Reply
100.

Doug Gardner

October 29, 2025 at 11:36 pm

@Sally: I had forgotten about Prolog, a fascinating language I stumbled across as an undergrad. Thanks for a triggering a fond memory!

Reply
101.

StringOnAStick

October 29, 2025 at 11:36 pm

Hmm, I think I recognize a new incarnation of an ex commenter from a couple years ago. Maybe someone else will notice too.

Reply
102.

Carlo Graziani

October 29, 2025 at 11:37 pm

@Ramona: That description is spot-on. I particularly like the usage of “approximate model”, which is an under-appreciated nuance.

Inference at queries “near the training data” (not a trivial statement to make quantitatively) gives reasonable output. Inference at queries far from the core of the training data (either in the sparse tails or entirely disjoint from the data support) yields very bad, but very confident output, because the model approximation is very bad there, but the model has no idea that this is the case.

Reply
103.

Sally

October 29, 2025 at 11:45 pm

@dmsilev: Ha – “where the number of degrees of freedom is quite large“. Understatement of the year! I’ve seen those proteins!

Reply
104.

YY_Sima Qian

October 29, 2025 at 11:46 pm

@Carlo Graziani: Part of it is the difficulty in monetizing “AI” in general, part of it is that most Chinese “AI” models are open source/weights to promote adoption & iteration (so no subscription fees as a revenue stream), part of it is Chinese corporate players are allergic to paying for SaaS (although Chinese consumers are willing as long as the subscription fees are relatively low), part of it is intense cutthroat competition (as opposed to the oligopoly in the US), & part of it is that the PRC’s political economy is organized to suppress producer surplus & promote consumer surplus. High profit margins is simply not in the cards in any sector.

Fortunately, the PRC government & most Chinese “AI” developers are not “AGI-pilled” as the SV techbros & the USG (under both Trump & Biden)/think tank world, & the exorbitantly expensive state of the art Nvidia chips are not available (except on the black market). Therefore, while both the Chinese corporate sector & the PRC state at various levels are investing huge sums into data center build out, it is nothing like the stratospheric numbers seen in the US for the “quick sprint to AGI” (not to mention the equally stratospheric sums required for power generation, to keep the DCs running). Likewise w/ the valuations.

All of the Chinese “AI” labs are focusing on innovating at the system level (on both the software & hardware side) to maximize efficiency of compute, largely as a consequence of the US export controls, while the US labs are taking the brute force approach of piling on raw compute. The single exception is DeepSeek, whose founder’s explicitly goal is AGI, but it behaves like the research lab funded by a quant hedge fund that it is (reminiscent of what OpenAI once was), so its singular focus is research & not applications or commercialization.

The PRC government is also approaching making data & compute available as public utilities for the corporate actors to access, which in theory could reduce duplication & overall expenditure. The purpose of the PRC’s industrial policy around “AI” is to encourage the diffusion of the new tools/capabilities across the economy & society, as widely as possible & as quickly as possible, to capture the efficiency gains & distribute the value created as widely as possible. Whether any Chinese AI outfit make a lot of money or achieve huge market capitalization in the process is secondary (only relevant in so far as some profit is required to incentivize private behavior). See the EV & solar industries in the PRC.

To your original comment, the likes of Alibaba, Tencent, ByteDance, Baidu can leverage “AI” to improve or expand their existing product/services & try to monetize that way, the likes of Zhipu are pivoting to industrial/business process “AI” models for commercialization, the likes of MiniMax are pivoting toward agentic “AI” for commercialization, & DeepSeek is on a long march toward “AGI”. Everyone else, “pure plays” such as Moonshot AI, has no clear path toward sustainability/profitability, despite producing very impressive models.

Reply
105.

Sally

October 29, 2025 at 11:47 pm

@Doug Gardner: Oh dear – am I that old! Actually, I used it because I wasn’t a programmer. I was someone who did some programming. So I was never near the sharp end.

P.S. Sorry for the autoincorrect on Prolog.

Reply
106.

laura

October 29, 2025 at 11:48 pm

@dmsilev: Yes, you are right, and I swept with the broad brush. Calculations, analysis and detection is a boon. It’s the intent to steal or corrupt artistic and creative work, skilled work (there is no unskilled labor), and idle the workforce that I vehemently object to.

Reply
107.

Sister Golden Bear

October 29, 2025 at 11:48 pm

@Deputinize America: Please consult the ML Oracle to determine the optimal time to begin the Butlerian Jihad.

Reply
108.

Sally

October 29, 2025 at 11:52 pm

BJ is a rare site where the comments are essential reading.

Reply
109.

Ramona

October 29, 2025 at 11:54 pm

@Carlo Graziani: My career was a victim of the that ML winter too. I’m most definitely sticking around! I remember being gobsmacked when somebody mentioned “The Singularity” to me and I had to explain that all these “AI” systems do now is pattern matching.

Reply
110.

YY_Sima Qian

October 29, 2025 at 11:55 pm

@no body no name: In the end, CUDA, like the Windows, Cadence/Synopsys/Siemens, Android, can be replaced. It will be a pain, & there is little incentive for people to do so absent enormous exogenous pressure, but the USG has been supplying the pressure & the incentives for Chinese players through the course of the tech. war.

Reply
111.

Ramona

October 30, 2025 at 12:00 am

@Carlo Graziani: Thank-you Dr Graziani. I have waited a long time for somebody to explain things as you have.

Reply
112.

Karen Gail

October 30, 2025 at 12:11 am

Lived in Sunnyvale, CA in early 70’s; ex worked for R & D in what would become a massive computer giant; one of the engineers wondered if they could bring Asimov’s vision “I, Robot” to life. It appears that trickle of thought has become a flood as people attempt to bring AI into reality without realizing that each and every choice a human makes is often subconscious, something that can’t really be taught to a machine.

Reply
113.

Ramona

October 30, 2025 at 12:16 am

@Carlo Graziani: I was sure that you had used the term “approximate model” in your write-up. After seeing your comment, I checked and found out I’d subconsciously summarized what you’d said and regurgitated my summary to japa.

Reply
114.

Ramona

October 30, 2025 at 12:19 am

@YY_Sima Qian: Oh, the usefulness of benignly managed capitalism!

The purpose of the PRC’s industrial policy around “AI” is to encourage the diffusion of the new tools/capabilities across the economy & society, as widely as possible & as quickly as possible, to capture the efficiency gains & distribute the value created as widely as possible.

Reply
115.

YY_Sima Qian

October 30, 2025 at 12:24 am

@cain: B30A for soybeans is not serious analysis. B30A for refined rare earth elements & rare earth magnets (also tungsten, high purity graphite, etc.) is the trade. (If it materializes. Knowing the Trump gang, there is the possibility that they were leaking proposals to approve the B30A to pump up Nvidia‘s stock price so they can make a killing, then announce to ban the B30A to tank Nvidia‘s stock so they can make another killing shorting it. We will know in a few hours.)

Neither the US nor the PRC will dismantle their respective systems of export controls, & the export licenses can be revoked at any time should the trade/tech. wars escalate again. Some of the more triumphalist Chinese ultra-nationalists have even suggested critical minerals for Taiwan & EUV, but that is not serious analysis, either.

All of these leverage points for both sides are wasting assets, having been played or threatened. The PRC is determined to indigenize the full semiconductor fabrication stack, & the US (& Europe/Japan) are strongly incentivized to diversify the supply of critical minerals. It will be a race to see which side can execute the industrial policy more quickly & more competently. As I stated before, refining/processing rare earth elements is far less technically challenging than developing EUV, but the PRC has far greater state capacity/experience to execute industrial policy, has had a 7 years head start to de-risk from US supply than vice versa, & the CPC regime is not finance-brained.

Reply
116.

no body no name

October 30, 2025 at 12:28 am

@YY_Sima Qian:

I’m entirely with you yet on the other hand nothing has killed off Windows. Android is also in almost everything now. It’s hard to kill off a standard once it is a standard. Hell ARM powers everything from your apple devices to your TV now.

This is part of what the AI race is about. It’s going to happen but who is the next standard? For now it looks like CUDA and nvidia chips. Might not be though!

Reply
117.

YY_Sima Qian

October 30, 2025 at 12:30 am

@Ramona: “Benignly” is where the uncertainty lies!

Of course, capitalism in the US is far from laissez-faire, it has just been managed to primarily benefit capital owners. To the degree that it had been “benign” in the past, it was to ensure that enough crumbs were left for labor to head off revolt. The sections of MAGA supported by the SV techbros want to keep those crumbs for themselves, too.

Reply
118.

Ramona

October 30, 2025 at 12:31 am

@Ramona: Of course, the term “model” implies approximation but the redundancy in “approximate model” is a price worth paying to remember that the trained system is approximating reality.

Reply
119.

Ramona

October 30, 2025 at 12:38 am

@YY_Sima Qian: That occurred to me as I was typing this. In the early days of industrial capitalism, the law that the price of goods in a competitive system was driven down to the cost of producing the next unit after input costs were covered meant factory after factory went bust until the US government learned from Germany IIRC how to sponsor controlled oligopolies.

From your description, PRC seems to be managing its firms a bit more benignly than the US has managed theirs.

Reply
120.

YY_Sima Qian

October 30, 2025 at 12:43 am

@no body no name: That is the irony of the US tech. war against the PRC: left to the market & the individual markets, Windows, ARM, CUDA are sure to remain the dominant “standards” long into the future, because Chinese players (even state owned players) are incentivized to take the path of least resistance, risk & expense, which is to go w/ the well established foreign/US incumbent. All of the industrial policy by the PRC government before 2018 had not solved the coordination issue in the indigenization push. The investment pushed forward the development of indigenous tech., but that last 20% across the threshold for successful commercialization proved daunting, since they could not seize enough revenue share from the foreign/US incumbents to fund their own further iteration/improvement.

The US tech. war solved that coordination issue for the PRC, & dramatically changed the incentive structure & risk calculation of Chinese players, stated owned & private alike. IMO, the categorical error that the Trump 45 & Biden Administrations made was in interpreting the relative lack of successful commercialization of indigenous PRC tech. as lack of PRC tech. advancement/capability in general, & that the US enjoyed a far greater tech. lead than it actually did. (Plus the profound oversight of points of PRC leverage that can be utilized in retaliation, such as critical minerals, & failure to make much progress in addressing these points of vulnerability before continually escalating the trade/tech. wars through the past 7 yrs.)

Reply
121.

YY_Sima Qian

October 30, 2025 at 12:53 am

@Ramona:

From your description, PRC seems to be managing its firms a bit more benignly than the US has managed theirs.

IMHO, the PRC political economy is the closest example one can find where there is already much abundance & is being run to achieve abundance across the board. That is something that has to be experienced on the ground, not from reading trends charts in per capita GDP (even in PPP terms).

It has less to do w/ benevolence on the part of the CPC regime, although developmentalism is central to the regime’s modern self-identity & continued legitimacy, but also owes something to the production-focused orientation of its Marxist roots, as opposed to the consumption-focused orientation of Neo-Liberalism.

Needless to say, there are many aspects to the CPC regime’s governances that are far from benign (particularly to the individual), including aspects of its economic management (in detail), but I do find its economic management philosophy more appealing than the monetarism/financialization currently prevailing in the West.

Reply
122.

Ramona

October 30, 2025 at 1:02 am

@YY_Sima Qian: And here I thought in the halcyon nineties when PRC’s firms started to grow by leaps and bounds that the Marxist roots had been forgotten.

Reply
123.

no body no name

October 30, 2025 at 1:12 am

@YY_Sima Qian:

I’d agree. However certain standards, like say ARM or x86 are ubiquitus. Even Chinese products use these. There’s sort of no way around it. But we could devolve into RISC as well here. Just as there is no getting around who defines PCI-E or JEDEC.

While Chinese manufacturing and their own software is catching up there’s no getting around these standards. The more ugly fact is that their best GPU is four generations behind nvidia and AMD when it comes to compute on the consumer side. It’s good enough but it’s not good.

There’s the other issue of how all the best wafer makers come from ASML and they don’t sell their best to China and China has not caught up there.

IMHO the fatal flaw in the US AI strategy is genergy production but that’s another to uncork.

Reply
124.

WTFGhost

October 30, 2025 at 1:26 am

@kalakal: Well… with my brain damage, I expect I speak differently than most. It wouldn’t surprise me, if I had a manager, AI would be used for most employee troubleshooting e-mails, but, my boss would say “‘Ghost’s e-mails, I better skim them – he talks so different, I don’t trust AI.”

I really do understand that the problem of properly summarizing speech is insolvable, because I’ve been struggling to understand the nuances, and in the end, you really need to know the author a bit, to understand what the author has written. That is the one place where a gifted human summarizer, or, an extremely good translator, will always outdo AI that hasn’t been tuned for an author (or speaker).

Reply
125.

YY_Sima Qian

October 30, 2025 at 1:34 am

@no body no name: The gap is essentially due to the gap in lithography. If Huawei had not been put on the entity list in 2019 & therefore banned from accessing TSMC‘s latest process nodes, it is possible that Huawei would be competing head on w/ Nvidia on GPUs today, just like it was surpassing Qualcomm on mobile SoCs.

Given that the PRC government felt confident enough to ban the import of H20s, & Huawei has felt confident enough to publish its roadmap for super nodes for the next 3 yrs. (an unprecedented act since the company was placed on the Entity List), I think that suggests the PRC semiconductor manufacturing equipment industry will soon (if not already) successfully commercialize DUVi scanners to be able to scale at 7 nm or even 5 nm nodes, & that a domestic EUV may only be a couple of years away. Some people have suggested that the H20 ban is a bluff on the part of the PRC government to pressure the US to release more advanced GPUs. However, bluffing has not been the CPC regime’s style since Mao passed.

Finally, gaps in individual GPU performance can be partially compensated for by systems engineering on both the software & hardware stacks, as evidenced by Huawei‘s CloudMatrix 384 super node & planned successors, as well as the variety of Chinese “AI” labs developing SOTA or near-SOTA models, fast on the heels of the closed US labs, despite the export restrictions. Huawei’s hardware solutions, no matter how well performing at the rack level, will be more energy intensive as the result of being forced to use older nodes. Then again, the PRC, unlike the US, is not energy constrained.

I would also say instruction set architecture at the chip level is far stickier than software toolkit at the system level.

Reply
126.

Fair Economist

October 30, 2025 at 1:44 am

Thanks for writing this series, Carlo.

I’m curious what you think about the claim I saw today that Claude displayed some self-awareness to alterations in its internal state. The claim is much less strong than the headline implies, but as I read it claims that when Claude’s internal state was altered in a way representative of how Claude responds on encountering a token, about 20% of the time Claude claimed it was experiencing intrusive thoughts. My understanding of LLMs is that they didn’t the kind of feedback loops I’d expect to be necessary for this kind of self-awareness.

Reply
127.

YY_Sima Qian

October 30, 2025 at 1:55 am

@Ramona: CPC regime leaders never forgot those roots, if you read what Deng, Jiang & Hu were saying. Xi is merely the 1st CPC leader in 2 generations to have enough control over the Party-State, & the Party-State having enough governance capacity, to match words & deeds again.

Of course, from ’80s – ’10s, plenty of liberals (& Neo-Liberals, & libertarians) became influential, including at high levels of the Party-State apparatus, but they have not been purged/marginalized since Xi started to consolidate power.

To go back to Carlo’s subject at hand – why is there such a gargantuan “AI” bubble? The wild over-optimism for what “AI” delivers & promises is actually incidental IMHO, the massive money from the COVID stimulus & monetary easing is looking for places to obtain the highest return. In other words, looking for bubbles to inflate. If it is not “AI”, it would be something else (such as tulips, I kid…). The techno-utopianism around “AI” caters to the financial interests looking for explosive returns, & the financial speculation then feeds back on to the techno-utopianism. This is a consequence of a political economy being managed to maximize the interests of capital owners.

What is interesting is that capital once gravitated to capex light endeavors (& thus the most financially efficient), such as software, internet platforms, financial services, etc. The DC build out is anything but capex light. The last US bubble in a capital intensive industry was probably telecom in the late ’90s – ’01, & the original one probably the railroad boom. However, unlike railroads & optical fiber networks, DCs are rapidly wasting assets – the chips could be obsolete in as little as 2 – 3 yrs., at most 5.

Reply
128.

divF

October 30, 2025 at 2:01 am

Great start, Carlo. I’m learning stuff already.

One way of distinguishing what computational mathematicians do from what data scientists do is that for computational mathematicians, the question of, if you increase the number of degrees of freedom / computational effort in a simulation, does the answer improve, and if so at what rate, is central. What is the state of play for this issue for the various algorithmic components of ML ? I haven’t been able to extract the answer to that question in the universe of convolutional neural nets, but maybe I just haven’t tried hard enough.

FWIW, physical scientists are hardly blameless in this matter, with their historical tendency to view computer models as discrete analogues – grid-based methods are modeling little boxes of fluid, particle methods simulate physical particles (they don’t). Things have improved over the course of my career, but still, I’ve been trying to drag an answer to the question I posed above out of the plasma physics community regarding particle methods for 25 years. I finally was able to figure out a mathematically-principled answer (at least) to the question of what it means for a particle method to converge that is consistent with observed behavior of those methods.

Reply
129.

YY_Sima Qian

October 30, 2025 at 2:09 am

@Carlo Graziani: Here is a recent profile in the Guardian of one guy who is still searching for the path to “AGI”, not going down the DL dead end:

‘I have to do it’: Why one of the world’s most brilliant AI scientists left the US for China
In 2020, after spending half his life in the US, Song-Chun Zhu took a one-way ticket to China. Now he might hold the key to who wins the global AI race

Chang Che

Tue 16 Sep 2025 05.00 BST

Clickbait headline aside, it is a fascinating read, covering the early history of “AI’s” development (far less thoroughly than you have), divergent approaches to “AGI” (on which Zhu is now something of a heretic, but his critiques of DL is similar to yours), the Sino-US tech. war, Chinese Americans caught in the middle, etc.

Reply
130.

Fair Economist

October 30, 2025 at 2:12 am

@YY_Sima Qian:

B30A for soybeans is not serious analysis.

True, but we are talking about Trump here. When a doddering nutcase makes the decisions, very stupid decisions can result.

Your point about China pushing to keep its industries competitive to maximize longterm growth while the US allows increasingly extreme oligopoly to maximize profits concerns me. Basic economics says as oligopoly intensifies economic inefficiency does too. The rough general standard of 3 to 5 major companies per industry (airlines, autos, news, food distribution (!!!) etc.) is already pretty bad and now the Trump administration is downright supportive of further consolidation. Combined with the sudden return of tariff barriers, which neuter the possibility of import competition restraining oligopolistic exploitation, and I think we’re looking at sharp deterioration in US growth in the very near future.

The US in the middle of the 20th century actually did work hard to keep industries competitive, and that had a lot to do with our relative success then. With China continuing to support strong competitive markets and us falling to extreme oligopoly we could fall behind pretty quickly.

Reply
131.

Fair Economist

October 30, 2025 at 2:18 am

@YY_Sima Qian: The Chinese emphasis on competition while the US, especially under Trump, allows ever more extreme oligopoly, really concerns me. Basic economics says oligopoly is very costly to society – there’s a big deadweight loss involved in firms using oligopoly power to extract profits. If China keep its markets competitive as we race to increasingly inefficient oligopolies to maximize profits, we could fall behind very quickly. The sudden reappearance of tariff barriers will make this even worse as import competition restrains oligopoly extractions.

Reply
132.

cain

October 30, 2025 at 2:26 am

@YY_Sima Qian:

This is why Intel started oneAPI. So that we can create an industry standard using SYCL. Otherwise you are just dependent on Nvidia.

Reply
133.

YY_Sima Qian

October 30, 2025 at 2:35 am

@Fair Economist: You are very much right to be concerned.

Reply
134.

NotMax

October 30, 2025 at 2:43 am

LLMs don’t learn, they scan .And vacuum up data with no way (so far) to weigh and assess credibility (GIGO, anyone?) or make a defensible judgment when presented with contradictory information.

Reply
135.

no body no name

October 30, 2025 at 4:33 am

@cain:

This is sort of missing the point. Even on the consumer side nvidias proprietary hardware and software solutions are just flat out superior. Open standards exist and intel and AMD compete with Nvidia but when it comes down to it they get slaughtered on hardware and software. They are a monopoly.

Reply
136.

2liberal

October 30, 2025 at 6:20 am

thanks for taking the time to do this Carlo.

Reply
137.

Another Scott

October 30, 2025 at 8:37 am

Dean Baker’s (of CEPR.net) latest on his Patreon thing (to be on CEPR.net soon):

Chinese AI Seems to be Jumping Ahead
Dean Baker
2 hours ago

I confess to being very much a non-expert on AI. I read pieces in the business press and random articles that more knowledgeable friends pass along. But I can’t say I have any knowledge of the nuts and bolts of the technology.

However, I can understand costs and the story there does not look very good for the United States AI industry. It looks like the latest offerings from China offer comparable speed in computing at a small fraction of the cost. According to this piece on the new MiniMax M2 Model, it can deliver performance that is comparable to the cutting edge U.S. models, at just 8 percent of the cost. This system is also open source. That makes it cheaper to adopt and alter than proprietary models.

The need to develop efficient techniques was forced on China’s AI companies by the decision to deny them the most advanced computer chips produced by Nvidia and other chip makers. It now looks as though this prohibition may not have been a smart path from a competitive standpoint, even if it meant that the United States might have better AI in some abstract sense. Given the enormous need for electricity and water by data centers, it is a great thing that China seems to have designed systems requiring far less computing power, and therefore less electricity and water.

It is also worth noting that, while access to low-cost electricity may be a real constraint on the progress of AI in the United States, that is not likely to be a problem in China. Electricity there is plentiful, and the average price is less than half as much as it is here.

With tech stocks hitting ever higher levels (Nvidia just crossed $5 trillion in market capitalization), it’s hard to believe there is not a serious bubble in the U.S. markets. Having lived through two major bubbles in my adult life, I am used to the optimists coming up with ever whackier stories.

For example, for Nvidia’s current stock price to make sense, the company would have to earn about 10 percent of all projected corporate profits five years out. That’s not impossible, but I wouldn’t bet on it. I heard similar projections about Cisco Systems and other tech companies in the 1990s bubble.

I also remember being assured in the 00s that house prices could never plummet because they are not traded like stocks on an exchange. (I believe Federal Reserve Board Chair Alan Greenspan once pushed that line.) They also said that we had never seen the sort of nationwide plunge in house prices I was predicting. I pointed out that we had never seen the sort of nationwide run-up in prices that the country saw from 1996 to 2005.

Anyhow, it sure looks like a bubble to me. It will be ironic if the success of China’s AI is what ends up being the proximate cause of its bursting.

Baker was screaming about the housing bubble while most others were whistling “Happy Days are Here Again”.

My understanding is that one has to take these cost comparisons with a grain of salt – the “cheap” systems apparently often elide the enormous training costs. But I too am not an expert. The caution that “if something cannot continue forever, it will stop” is always a good one, and too many other players are working to reduce nVidia’s present dominance. Remember when Intel was the giant that crushed all rivals…

Thanks again, Carlo.

Best wishes,
Scott.

Reply
138.

dnfree

October 30, 2025 at 9:30 am

When I look at some of the junk AI-generated videos and photos, which of course don’t represent the best that AI is capable of, I am struck by their lack of awareness of the physical world and the physical experience of the world that flesh-and-blood creatures have. Like that video of Trump flying an airplane and dumping crap on demonstrators, where the face mask doesn’t cover Trump’s nose. AI has no awareness apparently of what the mask is for, just that it appears in pictures of pilots. It may know the word breathing, but it doesn’t know the lived experience of breathing. It doesn’t really know in a physical sense what fingers are for and how they work. I don’t know enough to predict whether that kind of knowledge is possible?

Reply
139.

Another Scott

October 30, 2025 at 9:42 am

@dnfree: What gets me about most of the stuff I see is that the images are so “flat”. The perspectives are all wrong for 3D objects. E.g. at the start of the 47 jet video, the front landing gear seems to be too far toward the viewer (IIRC). It’s just all wrong.

Of course, things will get better over time (6-fingered people are rare now), but it’s a huge waste at the moment.

(I assume 47’s face mask being down in the video is intentional. An allusion to his Covid mask theatrics.)

Grr…

Thanks.

Best wishes,
Scott.

Reply
140.

Bill Arnold

October 30, 2025 at 10:07 am

@dnfree:
(from twitter/x, iirc)
“These models see objects and reality as just an elaborate extension of texture”

Reply
141.

YY_Sima Qian

October 30, 2025 at 10:18 am

@Another Scott: DeepSeek showed that it is possible to can train a SOTA LLM for a small fraction of the cost of OpenAI/Anthropic/Meta. Then a number of Chinese “AI” labs demonstrated the same for inference. Since most of these Chinese LLMs are open source/weights, their claims have been independently verified & results replicated. Google has been pushing in the same direction w/ Gemini.

Reply
142.

YY_Sima Qian

October 30, 2025 at 10:28 am

Caixin making the same points I made across various comments in this post, but more cogently:

Poe Zhao @poezhao0605

Fascinating cover story from Caixin on the U.S.-China AI chip race.
The conventional narrative: China is playing catch-up.
The reality: two fundamentally different strategies are emerging.
U.S. approach: brute-force capital. OpenAI’s latest chip orders require 26GW of power. That’s three New York Cities at peak demand.
China’s approach: system-level optimization. Huawei’s Atlas 900 links 384 lower-powered chips to achieve 2x the performance of comparable systems. Higher energy cost, yes. But it works.
The counterintuitive finding: China’s real bottleneck isn’t supply anymore. It’s demand. Utilization rates at AI computing centers are below 30%.
Meanwhile, Bain estimates the U.S. is building toward an $800 billion revenue gap by 2030. The infrastructure spend may be running ahead of actual applications.
Worth reading for anyone tracking this space. Link in comments.

The article:

Cover Story: U.S. and Chinese Chipmakers Tread Different Paths in AI Gold Rush
By Liu Peilin and Han Wei
Published: Oct. 27, 2025 5:19 a.m. GMT+8

Reply
143.

Kayla Rudbek

October 30, 2025 at 10:31 am

@Mr. Bemused Senior: people were trying character recognition decades ago when I was doing document review; not sure how well it works now, but apparently AI/ML has a very difficult time with recognizing that “P is not Q” and “P is Q” are two different statements with opposing meanings

Reply
144.

YY_Sima Qian

October 30, 2025 at 10:43 am

Some analyses that goes beyond performance at the GPU level:

tphuang @tphuang

We are in 2025Q4 & HW has already showcased Atlas-960 SuperPoD w/ 2.2 TB/s interconnect & 34 PB/s total interconnect + 9.6 TB/s memory speed + the full system-level performance. ppl on X are using metrics like PP/$ to measure AI-chips. Spend less time thinking about per chip cost or performance. Think about things on a system level. Why does HW emphasize the all optical connections bw cabinets? Why does HW create its own HBM for inference? What about the push for SiPo chips w/ self generating light for optical modules? What about HW’s new UB Mesh protocol? How is HW able to do 15488 card SuperNodes, but Nvidia can only do 72 or 144?

AI chip cost is a tiny portion of overall AI DC buildout cost. America is never going to be competitive vs China in cost per 100 EFLOPS of compute. Especially not when companies like Meta are building their AI servers in the home of CJNG cartel & wasting expensive AI chips.

The level of histronics over B30A is getting way out of control. If America loses AI race, it will be because China has a lot more human capital & data + a full supply chain to realized embodied AI. And not wasting all that compute on chatbots to keep ppl hooked on another app.

Not the right framing here since China already has its own AI chips that are widely used. Keep in mind Nvidia share of China mkt is probably 50% this yr & shrinking. HW itself has a fully AI chip supply chain. No American firm has that.

17 RE metals. Building supply chain for each are of different difficulty level. Some HREs are separated as by-product from other metals. China only build capacity for them bc it has the other industries. So, high difficulty, but not necessarily high tech

Reply
145.

Carlo Graziani

October 30, 2025 at 10:58 am

@Fair Economist: I would say that this is a perfect example of what I referred to as “circular reasoning” in the essay. I will write more about this in a couple of weeks, when I get to examining AGI, but for now I’ll just observe that such claims never state what “awareness” or “reasoning” are as model terms; we are only told what signals such attributes of cognition would produce in LLM output. Which signals are then promptly discovered. The intellectual standards underpinning such “research” are beneath any scientific standard, in my opinion.

Reply
146.

Carlo Graziani

October 30, 2025 at 11:14 am

@divF:

One way of distinguishing what computational mathematicians do from what data scientists do is that for computational mathematicians, the question of, if you increase the number of degrees of freedom / computational effort in a simulation, does the answer improve, and if so at what rate, is central. What is the state of play for this issue for the various algorithmic components of ML ? I haven’t been able to extract the answer to that question in the universe of convolutional neural nets, but maybe I just haven’t tried hard enough.

What a great question. Actually this will take up a good deal of Part 6, on hyperscaling. But to anticipate things a little:

In the context of data modeling, this is the question of model capacity AKA parameter size. And this has always been a strange thing about DL methods, because overparametrization (many more parameters than data samples) has seemed to be a key to their success since the beginning. No DL method ever succeeds without increasing model capacity to overparametrize the data size.

And from a statistician’s point of view, that’s more than a bit weird. If I have a model that I want to approximate the unknown distribution from which some data was sampled, I would expect the number of required parameters to be set by the distribution, not by the number of samples drawn from the distribution for training. If I train a model with a certain dataset and then happen to obtain more training data from the same distribution, further training ought to increase the precision with which I can estimate my existing parameters (the root-N effect). What should not happen is that I should be forced to increase the parameter size in order to accommodate the new data. That’s how kernel density estimation works, and KDE is a horrible, non-scalable methods.

It turns out that most natural data is “easy” enough to model that this capacity scaling “law” is affordable, especially with modern HPC tools. With NLP, however, it’s a different story. Natural language is a bitch to model. We couldn’t really do it well at all before transformers showed up. Now we can, but the cost is eye-watering. Having to increase model capacity to keep up with data corpus size is unaffordable, but that is precisely the feature of DL that drives hyperscaling.

There needs to be another ML way forward besides DL, because we’re going to boil the oceans trying to do “AI” with transformers.

Reply
147.

RSA

October 30, 2025 at 12:05 pm

@Carlo Graziani: Another terminology question: What do you mean by modeling natural language?

Reply
148.

Another Scott

October 30, 2025 at 12:31 pm

@YY_Sima Qian:

I was referring to things like this story at TheRegister (from September).

It costs what it costs, and likely costs less in China, and accountants rule the world, but one always needs to be aware of the various agendas.

Thanks.

Best wishes,
Scott.

Reply
149.

the_mjl

October 30, 2025 at 1:16 pm

::delurking::

Good stuff, Carlo.

My MSEE (1991) had me using neural networks to find mines in sonar images. It took a long afternoon to train on one data set back then, but, honestly, “the math” has changed very little. Oh– and it took fewer than a few hundred ‘neurons’ to do this. No GPUs or personal electrical power-station required.

IMO (tl;dr), this round of so-called “AI” is doomed to fail on energy/thermodynamic grounds. A human brain consumes only 12-25 Watts– and that’s a WHOLE brain. An LLM is basically a (poor) statistical model of what comes out of a small part of that brain that’s only a few mm deep…. and THAT needs nuclear reactor-levels of power?

“Throw more GPU/Watts at it until it does what we want!” This EE calls Bullshit. That’s not an ‘innovation curve’. That’s not even Engineering. That’s essentially a “mining” operation, furiously burning cash to “DIG HARDER!” before it all goes bust.

Reply
150.

Ray Ingles

October 30, 2025 at 1:17 pm

As you note, not all Machine Learning is “Deep Learning” – but otherwise this seems to resonate pretty well…

https://m.xkcd.com/1838

Reply
151.

Ramona

October 30, 2025 at 3:08 pm

@YY_Sima Qian: I always learn a lot from your comments. Thanks for this. This AI inflating bubble also reminds me of tulips. At least people were buying tulips.

What does DC stand for?

Reply
152.

Ramona

October 30, 2025 at 3:11 pm

@the_mjl: But our species have had thousands of years of evolution, billions of instantiations and locomotion while learning so multiply our 12.5 watts by that…

Reply
153.

Ramona

October 30, 2025 at 3:12 pm

@Ray Ingles: Cute!

Reply
154.

Ramona

October 30, 2025 at 3:36 pm

@dnfree: Also, it is very likely that huge amounts of excreta released at such an altitude would merely dissipate into a thin mist.

Reply
155.

Ramona

October 30, 2025 at 3:59 pm

Sir Roger Penrose tries valiantly to explain to clueless interviewer why Godel’s incompleteness theorem indicates that intelligence cannot be implemented by computation or current deep learning cannot lead to AGI: youtu.be/biUfMZ2dts8?si=cDKD7G8hoKObvsmO

Reply
156.

Bill Arnold

October 30, 2025 at 4:15 pm

@Ramona:
In context, I read DC as Data Center(s).

Reply
157.

the_mjl

October 30, 2025 at 4:16 pm

@Ramona: Exactly… so why not learn (steal?) from THAT?

“How many GPUs can YOU fit inside a dragonfly’s pinhead brain?”

As someone who’s still pretty hardware-focused, the current approach all just seems like a Dead End.

Reply
158.

Bill Arnold

October 30, 2025 at 4:46 pm

@Ramona:
If anyone is curious re Penrose, the wikipedia article Orchestrated objective reduction is … interesting.
Also, more mainstream, Neural correlates of consciousness.
An old psychedelic rock song seems apt:
“So if you think you’re where you are
Beware, you’re far away
A mystery that has no clue
Where are you?”
(“Where are you” (Sunforest, 1969, lyrics )

Reply
159.

Ramona

October 30, 2025 at 5:22 pm

@Bill Arnold: Thanks. I’m awful at context. My friends get exasperated with me when I interpret their mumblings in the most bizarre ways. In this case, I went off on a Deep Learning, Deep Cearning, Deep Context ramble.

Reply
160.

Ramona

October 30, 2025 at 5:22 pm

@Bill Arnold: Lovely!

Reply
161.

Ramona

October 30, 2025 at 5:28 pm

@Bill Arnold: Thanks especially for the object reduction wikipedia entry. I’d recently watched a YouTube on microtubules, the anesthesiologist and consciousness. I understand much better when I read though…

Reply
162.

Ramona

October 30, 2025 at 5:43 pm

@the_mjl: Now if we could but figure out how to connect several dragonfly brains in parallel…

Reply
163.

RSA

October 30, 2025 at 5:55 pm

@Ramona: I read Penrose’s book, The Emperor’s New Mind, when I was in grad school, so it’s been some time, and I haven’t followed his thinking since then.

Penrose claims that intelligence requires consciousness, but his argument isn’t clear. Chalmers has the best counter, I think–we can’t yet even characterize consciousness in rigorous terms; micro-level phenomena are no more useful than macro-level phenomena as explanation. Penrose also brings in Goedel’s theorem, which is just irrelevant.

I think he’s a fringe voice, with interesting ideas but not a lot of theoretical or empirical support (or even logic) going for them.

Reply
164.

Carlo Graziani

October 30, 2025 at 6:09 pm

@RSA: It’s a question of what is the data that is being modeled. The hardest data to get a halfway-decent model of is text and speech by humans, which is natural language. Compared to that, modeling weather data, or photographic images, or fluid flows, or chemical structures etc. is easy and “cheap” — anything is cheap compared to training a transformer on a large corpus of human language.

Reply
165.

Sally

October 30, 2025 at 6:12 pm

@YY_Sima Qian: I guess you won’t see this, however … I found a similar dynamic in a completely different field in a completely different era. When I worked in a DOE lab, we had (I think) four 1 MeV electron microscopes. Which often sat idle. My overseas colleagues were aghast. Their TEM’s were in constant use, with very few at 1 MeV. Then, the race for first imaging of the atom was achieved by a small group in Australia (CSIRO), using a .6 MeV machine. And a new theory to help decode the image, Dynamic Theory of Electron Diffraction. It was brain power more than electrical power and dollars that achieved that breakthrough. I wondered at the time if a surfeit of dollars was sometimes an impediment to advances. Being forced to think of another way about a problem was more beneficial than throwing more and more dollars at it. Heresy!

Reply
166.

Carlo Graziani

October 30, 2025 at 6:12 pm

@RSA: Penrose is still one of the most gifted mathematicians and physicists of our age. He admittedly did go off the deep end with Emperor’s New Mind. But The Road to Reality is a brilliant, extended effort in calling bullshit on string theory using very cogent mathematical physics.

Reply
167.

RSA

October 30, 2025 at 6:16 pm

@Carlo Graziani: Thanks, that helps.

Reply
168.

Carlo Graziani

October 30, 2025 at 6:17 pm

@YY_Sima Qian:

That story on Zhu is a very interesting read. I have a lot of sympathy with his research outlook.

Reply
169.

RSA

October 30, 2025 at 6:27 pm

@Carlo Graziani: I’m happy to take your word for Penrose’s abilities as a physicist. And he’s far from the only physicist to dive into AI with all the enthusiasm and hubris of a new grad student…

Reply
170.

Mr. Bemused Senior

October 30, 2025 at 6:35 pm

@Kayla Rudbek: I don’t expect ML to give answers as such, but I suspect it can be an aid to an expert plowing through boxes of documents in a case (a task I have had to perform). Especially it can eliminate duplicates and perhaps help with cross references.

Scanning printed documents and character recognition are pretty good these days. I’m just wondering whether it’s practical to train a custom model from the document production and then use it.

Reply
171.

YY_Sima Qian

October 30, 2025 at 7:16 pm

@Sally: Thank you for sharing this! The dynamic is real.

Reply
172.

Kayla Rudbek

October 30, 2025 at 9:35 pm

@Carlo Graziani: which is why AI does such a rotten job with legal writing and research, which are all about the words

Reply
173.

Luc

October 31, 2025 at 1:57 am

Great article, even for people lacking some I.

Thanks!

Reply
174.

Paul in KY

October 31, 2025 at 8:51 am

@no body no name: Well, that’s just great!

Reply

Hi god, it’s us. Thanks a heap, you’re having a great week and it’s only Thursday!

SCOTUS: It’s not “bribery” unless it comes from the Bribery region of France. Otherwise, it’s merely “sparkling malfeasance”.

A sufficient plurality of insane, greedy people can tank any democratic system ever devised, apparently.

Be a wild strawberry.

The cruelty is the point; the law be damned.

Let me file that under fuck it.

They don’t have outfits that big. nor codpieces that small.

Nothing worth doing is easy.

Wow, you are pre-disappointed. How surprising.

“Can i answer the question? No you can not!”

You don’t get rid of your umbrella while it’s still raining.

Republicans want to make it harder to vote and easier for them to cheat.

You passed on an opportunity to be offended? What are you even doing here?

You know it’s bad when the Project 2025 people have to create training videos on “How To Be Normal”.

There are times when telling just part of the truth is effectively a lie.

“A king is only a king if we bow down.” – Rev. William Barber

Beware of advice from anyone for whom Democrats are “they” and not “we.”

Today in our ongoing national embarrassment…

Boeing: repeatedly making the case for high speed rail.

“In this country American means white. everybody else has to hyphenate.”

It may be funny to you motherfucker, but it’s not funny to me.

The desire to stay informed is directly at odds with the need to not be constantly enraged.

GOP baffled that ‘we don’t care if you die’ is not a winning slogan.

Tide comes in. Tide goes out. You can’t explain that.

Part 1: What is AI, And How Did We Get Here?

Guest post series from *Carlo Graziani.

On Artificial Intelligence

Who Am I, And Why Do I Think I Have Something To Say About AI?

Part 1: What Is AI, and How Did We Get Here

On Learning

A Brief History of Machine Learning

On Model Correctness

Drawing Conclusions: Costs and Benefits

Guest post series from *Carlo Graziani.

On Artificial Intelligence

Who Am I, And Why Do I Think I Have Something To Say About AI?

Part 1: What Is AI, and How Did We Get Here

On Learning

A Brief History of Machine Learning

On Model Correctness

Drawing Conclusions: Costs and Benefits

Reader Interactions

Commenters

Filtered Commenters

Settings

174Comments

Leave a Comment