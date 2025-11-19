Come for the politics, stay for the snark.

You are here: Home / Science & Technology / Artificial Intelligence / Part 4: If There Were AGI, What Would It Look Like?

Part 4: If There Were AGI, What Would It Look Like?

by | 60 Comments

This post is in: , ,

Guest post series from *Carlo Graziani.

Guest Post: AI 1

On Artificial Intelligence

Hello, Jackals. Welcome back, and thank you again for this opportunity. What follows is the fourth part of a seven-installment series on Artificial Intelligence (AI).

The plan is to release one of these per week, on Wednesdays, with the Artificial Intelligence tag on all the posts, to assist people in staying with the plot.

The original plan was to skip Thanksgiving week. However, I’ve been talking to WaterGirl about the technical level of these posts, and I’ve come to realize that it’s been a bit off-putting to some readers. So I think that during the turkey-day break, I’ll try to provide a high-level summary of where the series has been with an eye to keeping the nerd-babble under control.

That said…

Part 4: If There Were AGI, What Would It Look Like?

Part 3 ended with a bit of a rant, because I felt the need to express outrage at the very loose and lazy intellectual standards prevailing in much contemporary “AI” research, at least insofar as discussion of artificial General Intelligence (AGI) goes. My perspective on the subject is by no means a majority view, and I feel a little like Diogenes, shaking his fist at the corrupt world from the austerity of his barrel.

The thing is, I don’t really enjoy the role of Diogenes, because “burn it all down” is a fundamentally destructive outlook on such things. I happen to feel that the scientific accomplishments of modern machine learning, while often oversold, are very real. I don’t want to give the impression that I think the entire subject is worthless, just because the current scientific discussions of AGI are so fundamentally wrong.

As to AGI itself, I think there is something else I need to clarify: I do not intend to say that it is impossible to achieve some version of AGI: I am simply saying that AGI is impossible along our current technological path, which is to say, based purely on machine learning techniques.

I am philosophically a materialist. I do not believe in souls. I think that consciousness is something that physical brains do, a phenomenon that arises from the electrical activities of billions of grey cells. And that being the case, I cannot in good conscience believe that it is surely impossible to bring about some kind of entity, in software running on computer hardware, that recognizably emulates aspects of human cognition, including reason. I do expect that this feat will be far more challenging to accomplish than chatbot parlor tricks we currently call “AI”. Even if true AGI is possible at all, we might not see it happen for many decades. Nonetheless, fundamentally, some AGI technology should be possible in principle.

What I want to attempt today is to describe what the scientific basis for such a technology might look like. I base this discussion on an article that I have written that is currently under review (those of you who would like to take a deeper dive will find the draft article here).

This is a purely speculative venture, and what I write here, however well-motivated, could easily turn out to be wrong. Nevertheless I think this is a useful exercise, for two reasons: it is useful to at least try to point to a possible exit from the stagnant state of current research on AGI; and, it is useful to at least try to illustrate what type of research concerns ought to replace those currently occupying scientists working on AGI.

What Should We Require Of A Theory Of Artificial Reason?

I want to narrow down these considerations, from AGI (a term for which no accepted scientific definition exists) to artificial reason, which is at least amenable to some specific discussion. What I would like is a model of what we mean by the term “reason” that is specific and detailed, to the point of being amenable, at least in theory, to implementation as software. Such a model would at least get us away from the territory of bullshit claims such as “self-organization” and “emergence” of AGI.

Last week, I discussed human reason in the context of what sort of traces it might leave in natural language text, to examine the plausibility of claims that reasoning states can be recovered from large text corpuses. I pointed out that our own reason rests on a foundation of subrational processes which almost certainly leave no such trace in text. Cognitive scientists have only the vaguest notions of how those processes work, and they can certainly not exhibit any models for them that are sufficiently specific to be represented as software. So trying to build a principled “bottom-up” model that mimics how reason emerges in a human mind is probably hopeless, at least for now.

What is left, then, is a “top-down” approach. What I mean by that is that we must work at an abstract level rather than at a mechanistic one. We must state what we mean by “reason” in general terms, in a way that we cannot directly show to be connected to the mechanisms of human reasoning, but which is motivated by the structure of reasoned thought. Also, we would like a model that can be expressed as mathematically as possible, because the point here is to come up with something that we could imagine translating into computer code.

Oddly enough, we already have one aspect of human cognition that can be represented this way: learning. We have seen that there is a subject called statistical learning, wherein by some method one learns an approximation to the statistical distribution from which some dataset was sampled, and one concomitantly learns to structure reasonable decisions based on that distribution. I’ve been a little vague about how this works, but it is a process that can be represented quite generally by the kind of model that I have in mind here.

So one possible approach (certainly not the only one!) is to take that representation of learning and generalize it, to represent reasoning. This approach has two advantages: it allows us to get a free ride on the existing model, which appears to work for learning; and, it allows us to connect and contrast “reasoning” to “learning”, so that we can begin to see what the relationship might be between the two.

A Cast Of Characters

This is all very abstract, and it will be helpful to provide concrete examples of reasoners (or alleged reasoners) to consult as we go along. I have three such examples for you:

  • The astrophysicists who were trying to puzzle out the nature of Gamma-Ray Bursts (GRBs) between 1973 and 1998. The GRB phenomenon consists of bursts of gamma rays (duh!) that arrive at the Earth from random directions on the sky, never repeating. When they were discovered, and for the quarter-century that followed, their nature remained mysterious, because they seemed unconnected to any other astronomical phenomenon. The available data consisted of gamma-ray “light curves” (time traces of gamma-ray intensity), spectra (distributions of gamma-ray energies in the burst), event durations (fractions of a second to hours), and locations in the sky. The latter were only known very inaccurately: the so-called “error boxes”, regions of the sky from which the events might have arrived, were very large by astronomical standards, many degrees across, because it is difficult to create direction-resolving instruments for photons at gamma-ray energies. We will use the story of how the mystery of GRBs was solved to illustrate an aspect of our model of reasoning.
  • A DIY home electrician (name redacted to protect the guilty) attempting to install a light fixture into an electrical box. He is following very standard procedures, using techniques, tools, and materials that he has trained to use and understand, and is moderately skilled. However, for some unknown reason, the fixture installation is failing, because of a persistent short-circuit that only manifests itself when the fixture is finally secured to the wall, and the circuit breaker is turned back on. When he turns on the circuit breaker with the fixture not secured to the wall, there is no short circuit, and the fixture works correctly. He is trying to figure out why by inspecting wire nut connections and checking for crimped wires. We will use this story to illustrate another aspect of the our model of reasoning.
  • An LLM undergoing training, or a trained LLM making new inferences. It doesn’t reason: it’s just along for the ride.

Bayesian Updating As A Model Of Learning

Let’s get started with learning.

We can exhibit an abstract model of learning using Bayesian statistical theory. I’ll describe how this works without writing down any equations (there aren’t that many equations, and you will find them in that draft article if you care about that sort of thing). There are two elements to consider: a parameterized model, and an evidence stream.

The role of the evidence stream is to provide new information to be assimilated. The evidence is presented sequentially, one discrete piece at a time. It comes from a fixed set of possible pieces of evidence. There may be infinitely-many such pieces, but they are related by some structural relationships.

Examples of such evidence streams are GRB light curves, spectra, durations, and arrival directions; or the results of the DIY electrician’s inspections for faulty wire connections or crimped wires; or pages of text presented to the LLM in training.

The role of the parameterized model is to provide a description of the structure of the evidence. “Parameterized” simply means that the provided description is controlled by a set of numbers (the parameters) that act as control knobs on the model. Twist those knobs, and the model’s description of the evidence structure changes. There may be a half-dozen such knobs, or there may be billions, depending on the model and the evidence. The model is fixed, but we may set the knobs any way we choose.

The model might contain statements such as “the source of the GRB is a neutron star in our galaxy” and the corresponding knobs could be the star’s spin rate and magnetic field intensity, and its distance from Earth; or the model could contain the statement “one of the wires is getting crimped against the box’s mounting strap” and the corresponding knob would be the identity of the offending wire; or the model might be the LLM itself, and the corresponding knobs would be the billions of parameters that must be set in training.

We do not initially know which settings of the knobs provide the highest-fidelity description of the evidence structure, i.e. which settings are most predictive of the evidence. However,once we start viewing evidence, we have a procedure for weighting the knob settings. “Weighting” means that we may view some settings to which we have ascribed higher weights as being more likely than other settings with lower weights, because the higher-weight settings provide better descriptions of the evidence.

This weighting procedure is called Bayesian updating. As the model views each new piece of evidence, this (fairly simple) mathematical procedure describes how the weights shift among the knob settings. Generally speaking, a single piece of evidence produces a relatively small adjustment of the weights. Over time, as evidence accumulates, what may happen in the ideal case is that a small set of knob settings will hog most of the weight while remaining settings will have essentially zero weight, and we will conclude that those highly weighted settings are “preferred” by the evidence (in the sense that they give the most satisfactory predictions of the evidence).

That, in a nutshell, is our model of learning.

When Learning Stalls

One problem with statistical learning is that the happy circumstance where the weights contract to a small set of knob settings can be difficult to obtain. There are two possible problems with it:

  1. The evidence may not shed enough light on the model. In this case, we would say that the evidence is not informative about the model.
  2. The model may not be sufficiently descriptive of the evidence. In this case, we would say that the model is not explanatory of the evidence.

If either of these circumstances holds, the Bayesian updating process will stall, and the weights will not decisively concentrate on a winning set of knob settings.

In the case of GRB astronomy, a consensus developed in the 1980s that there was a Case (1) problem: the evidence was not informative with respect to any proposed model of GRBs. The problem was that the source location error boxes were too large, and too-tardily reported. It was felt likely that the transient GRB phenomenon was in all likelihood associated with equally-transient phenomena at other wavelengths, and that observing such transients might be the key to unlocking the mystery. But a 4-degree error box on the sky is always crowded with astronomical sources, including time-varying ones, and it was simply not possible to identify any one of them as the culprit. GRB research stalled. Bad evidence!

In the case of the DIY electrician, something was clearly not right with his understanding of the situation inside the box, because after multiple inspections it was increasingly clear that all the connections were fine, and none of the wires were getting crimped. Something else, not suggested by the model, had to be at fault. Bad model!

In the case of a trained LLM’s efforts to respond to prompts, we mostly have a bad model problem, in my opinion. Certainly, the hallucination phenomenon suggests a very brittle model that easily goes off the rails. However, depending on the objective of the training, there might also be a bad evidence problem, particularly in the case of training an AGI: as I discussed last week, the text corpus almost certainly contains no information concerning the origins of human reasoning processes.

Where’s The Aha! ?

Note one characteristic feature of the learning process that I described above: it is in essence continuous. Piece of evidence comes in, small adjustment occurs in weights. Lather, rinse, repeat.

If we are going to base an account of reason on straight up learning, as the LLM research community is attempting to do, this is a very serious (although largely unrecognized) problem, because one of the salient features of reason is that it often operates discontinuously. We have all, I am sure, experienced those moments of “Aha!” revelation, in which suddenly some issue that we have struggled with suddenly seems easily solvable. The problem has suddenly flipped and twisted in such a way that clarity replaces darkness. If there is an aspect of reason that distinguishes it from other cognitive activities, I submit that “Aha!” is that aspect.

That’s the problem with the “learning to reason” approach to AGI. Learning is an essentially continuous process. It simply cannot produce the “Aha!” discontinuity. There is no pure learning path to Artificial Aha! (AA). As a type of cognition, learning is severely limited by restrictions on evidence and model choice. Essentially, all it can do is update its weights across the fixed model’s knob settings, based on evidence drawn from a fixed collection of evidence types, in the hope that some settings are explanatory of the evidence and that the evidence is informative about the model.

It should go without saying that this does not begin to capture reasoning. Anyone reflecting on their own occasions of “Aha!” moments of sudden clarity and insight (not necessarily in the pursuit of natural science, home repair, or computer science, but in solving any puzzle in any field of human activity!) should understand that those moments do not come from a process analogizable to gradual constraining of a model through gradual assimilation of accumulating data. “Aha!” moments are essentially cognitive discontinuities, gestalt shifts that suddently alter the process of assimilating evidence into a model, and are incompatible with the continuous learning process described above. So what are we talking about when we talk about “reason”, and in what way is it related to learning? And, how might we produce AA?

Evidentiary Reform

Suppose that we recognize that we are in Case (1): the evidence is not informative of the model. Then the move is obvious: we change our evidence stream. We cast about for a new stream of more powerful evidence that speaks more clearly to our model, using our knowledge of model features that might be sensitive to other types of evidence, as well as of what new types of evidence might be feasibly acquired. We refer to this shift as Evidentiary Reform.

Evidentiary reform is pretty much the approach taken by astrophysicists to decode the nature of GRBs. Realizing that no GRBs could be associated with a transient counterpart in other wavelengths because of the inaccuracy of GRB locations, GRB scientists developed new high-precision X-ray localization instruments, and arranged for GRB locations to be propagated in real time to ground-based optical and radio observatories. The first transient optical counterparts of GRBs (the so-called “afterglows’’) were detected in 1998, revealing their extragalactic nature through their substantial absorption redshifts1. By 2003 a core-collapse supernova in a relatively nearby galaxy had been caught in flagrante in a GRB error box (whose size was now about 0.05 degrees), associating GRBs with a certain type of supernova. Case closed. The new stream of evidence, brought into being to correct the weakness of the previous evidence, transformed the mystery into a soluble problem.

The ability to propose evidentiary reform to obtain better model constraints is certainly an example of a true reasoning process. It has the required “Aha!” discontinuous character, embedded in the realization that a new type of information is required for further progress. It is also a highly non-trivial thing to model in a computation, since a successful evidentiary reform needs to take into account not only the nature of the weakness of the previous evidence with respect to model constraint potential, but also practical considerations of how such new evidence can be obtained given real-world feasibility constraints.

Model Reform

Suppose that we recognize instead that we are in Case (2): the model is insufficiently explanatory of the evidence. Then, again, the move is obvious: replace the model with a new model capable of improved predictive power, and endowed with a new set of knobs. The new model might be suggested by the specific form of prediction failures common to the old model. It would likely also satisfy certain criteria of ontological parsimony, embodying some notion of Occam’s Razor-type simplicity so as to exclude model families of weak explanatory/predictive power. We will refer to this process as Model Reform.

The DIY electrician took this approach to finally figuring out his short-circuit problem. After several iterations of taking the fixture off the box and inspecting various electrical elements and connections for defects, and making sure the wires were neatly folded in the box so that they could not become crimped, he started to think of what could produce a short-circuit only when the fixture was secured to the wall. At which point, he realized that the screw securing the fixture to its mounting strap in the electrical box was long enough to reach through the box into the hole in the wall from which the electrical cable emerged, and bury itself among the wires in the cable, potentially crimping and shorting them. And an inspection of the end of the screw showed a dark discoloration that was not present originally, presumably due to the short-circuit passing through the end of the screw. A simple solution—replacing the screw with a shorter screw—immediately produced a satisfactory installation. The problem had been that the original model did not feature any role for the mounting screw. The new model now contained a statement “The mounting screw causes a short at the electrical cable when the fixture is fully secured.” It was induced by the inability of the original model to predict the short-circuits, and supported by new evidence (the discoloration of the end of the screw) which was not interpretable within the the original model.

A reasoner can produce an “Aha!’’ discontinuity through model reform, when a judicious replacement of the model results in improved predictions of the evidence, leading to marked improvements in the concentration of the knob settings weights. Again, this type of reasoning is not straightforward to model in a computation, since formulating a new model requires some sense of the data misfit and a formulation of some kind of Occam Razor conceptual parsimony constraint.

Reasoning and AI

In summary, this high-level account of reason ascribes to it the ability to supervise and intervene upon a learning process, discontinuously altering either the model or the evidence stream, which would otherwise be static features of the learning process. In addition, a reasoning process must be capable of recognizing when a learning process under its supervision stalls. When a stall occurs, it must diagnose whether the failure is more likely due to a bad model or to a bad evidence stream, and it must propose an alteration of one or the other, according to criteria suggested by the failure, while respecting important constraints on possible alternatives.

In other words, in this account, reasoning transcends learning in an essential manner.

This is major trouble for current attempts to obtain AGI, because, as we have been discussing, the entire subject is based on machine learning. Transformer-based LLMs are nothing but computational models that learn to represent an approximation to the probability distribution over token sequences encountered in their training data, which they exploit to construct likely sentence completions, sentence translations, sentence classifications, and so on. They do this so well that their output can belie its origin in probabilistic mimicry (in Emily Bender’s memorable phrase, they are Stochastic Parrots). They can produce the appearance of reasoned discourse at most times. But the process by which such models are trained is the gradual, continuous assimilation of millions of text documents into a stupefyingly large model. LLMs never do “Aha!” They simply aren’t wired that way, becuse their evidence streams and models are fixed.

This is the point that current AGI research appears to miss altogether. The view now gaining currency among practitioners is that the “emergence” of intelligence occurs in consequence of training models with billions, or trillions of parameters, as evidenced by the fact that such models can perform certain “reasoning tasks”. But performing reasoning tasks is not at all the same thing as reasoning: that is the circular argument for AGI again. Some modern AI system have been trained to write very creditable computer code. But the ability to write code does not make one a computer scientist—there are no AI computer scientists today, certainly none capable of proposing new conceptions and models. Similarly, some AI systems can prove mathematical theorems. This does not make them mathematicians, since there is much more to the cognitive activities of a mathematician than just proving theorems—it is far more challenging and useful to know which theorems are interesting to search for, and to create interesting new mathematical frameworks within which theorems can be searched for and proven. And, from the sublime to the ridiculous: an LLM-based AI electrician may know chapter and verse of the National Electrical Code, and be as conversant with tools, materials, and techniques as any licensed electrician. But faced with a situation not previously confronted by any training example it would not be able to reform its model or its evidence stream to suit the unexpected circumstance.

Is This Model Right?

I don’t know whether the model of reason that I argue for here is indeed correct, or in any sense valuable. It has obviously not been implemented in software and validated. As I have indicated, it would be highly non-trivial to represent the model in software. But not, I think, impossible. It is at least a specific model, and it is based on a set of mathematical ideas. One could at least begin building small toy systems that would permit some exploration of its features.

I imagine that AI practitioners would find it easy to reject this model and ignore the conclusions that it forces one to draw, because there is no output that one can judge it’s validity by. But please note that at least this is a model of reason. AI researchers have never deigned to supply such a model, instead relying lazily on vague notions of “emergence” and “self-organization” for which they offer no mathematical theory worthy of the name. Which is to say, they embrace the circular argument for AGI, discovering AGI in LLM output after declaring what AGI should appear as in LLM output. That is a worthless, contemptible scientific argument (Diogenes is getting the better of me again). If you want to tell me that your model “reasons”, show me your model of reason, and we can argue about whose model is better. I would love to have that conversation. It would be on a whole different intellectual plane from where AGI research is today.

  • altofront
  • Carlo Graziani
  • cmorenc
  • Eyeroller
  • Geminid
  • glc
  • Goku (aka Amerikan Baka)
  • Marc
  • Matt McIrvin
  • Mr. Bemused Senior
  • Passepartout
  • Paul in KY
  • Ramona
  • RSA
  • ruckus
  • Sister Machine Gun of Quiet Harmony
  • Socolofi
  • the_mjl
  • TONYG
  • Urza
  • WaterGirl
  • WTFGhost
  • Xentik
  • YY_Sima Qian

    60Comments

    1. 1.

      Marc

      I will mention that amongst Sam Altman’s ever changing definitions for AGI is the one that rings true to me, the point at which the revenue generated by OpenAI’s systems exceeds the cost of training and operating those systems. Based on that criteria, I guess they are about 10% of the way there.

    2. 2.

      ruckus

      We do need to add to our current levels of learning because we know far more than we did 75-100 years ago. Far, far more. We still have a long way to go but then learning takes time, effort, often prior level concepts, and possibly the concept that not every human will be able to understand the entire library of human understanding, and even todays full level takes a tad bit of time and learning levels to get there. A  question is, does everyone need to reach 100% knowledge and is that even possible? I don’t think it is, because likely so many people alive today didn’t get what would be the basics of learning 100% of available knowledge. And, is it even necessary that very many people do learn 100% or is it possible to teach everyone. Teaching humans is not like storing info in a computer.

    3. 3.

      WaterGirl

      There are a lot of words in the post, so I’d like to call attention to this part so nobody misses it.

      The original plan was to skip Thanksgiving week. However, I’ve been talking to WaterGirl about the technical level of these posts, and I’ve come to realize that it’s been a bit off-putting to some readers. So I think that during the turkey-day break, I’ll try to provide a high-level summary of where the series has been with an eye to keeping the nerd-babble under control.

    4. 4.

      Goku (aka Amerikan Baka)

      Carlo, I saw this comment from Urza in the Cole thread below, and was wondering if you had an opinion on the potential dangers to society posed by this “AI” technology as well as robotics:

      Human robots can replace human workers without extra design.  And that is the point of all the AI and robotics.  Replace all the workers.  There is never mention of what the humans will be doing when this happens.  There’s a pretense of humans will find other jobs like past technological revolutions but that is absolutely not what they are going for this time.  Eliminate all the people and keep all the money.  Its funny because any business that relies on a human as a customer is going to go away and they should be fighting this but none of them are thinking far enough ahead to what happens when no one has a paycheck and one can afford to hire a human because it will cost more just to feed them than to use a robot (eventually).

      Could this AI and robotics technology lead to permanent mass unemployment?

      I apologize if you’ve already touched on this in previous posts

      ETA: From what I’ve seen of “AI” it seems like a lot of vaporware

    5. 5.

      Marc

      Twist those knobs, and the model’s description of the evidence structure changes. There may be a half-dozen such knobs, or there may be billions, depending on the model and the evidence. The model is fixed, but we may set the knobs any way we choose.

      FWIW, I am not an AI computer scientist, I am at best an AI hacker.  One thing I’ve observed is that crafting a prompt to achieve a particular outcome, whether it be writing decently correct code or researching a specific topic, strongly resembles “programming” analog music synthesizers.  To create a particular sound, you had to try patching amplifiers and oscillators together in different sequences until you got close, then tweak the knobs.  Similar with detailed LLM prompting, you need to get discrete requests in the right sequence, then tweak the phrasing to achieve the right level of detail, or the output will mostly be worthless.

    7. 7.

      TONYG

      @Goku (aka Amerikan Baka): I don’t know whether the techbro idiots have a plan (aside from making a lot of money) — but if any of them are actually thinking beyond that, maybe the plan is to replace our current consumer-capitalist system with a kind of feudal system, with the Musks and the Bezos as kings, and everyone else as impoverished untermenchen.  Probably robot-soldiers to kill or control the lower orders.   Or, maybe these assholes are like most other capitalists, and are not thinking beyond the next quarter.

    8. 8.

      cmorenc

      The excellent sci-fi movie Ex Machina was about the problems a robot achieving AGI might cause its creators, especially if arguably the AGI had crossed over the frontier of self-awareness. Alicia Vikander is terrific in portraying Ava, the AGI-equipped humanoid robot.

    10. 10.

      Xentik

      I think your model for reasoning is good, but I would guarantee that you could not apply it to the entire human race, i.e. some people would just not qualify as being able to reason. Similarly, saying that LLMs can do some programming and solve some proofs, is already better than the vast majority of humanity is capable of. We cannot compare the aggregate intelligence of all of humanity to a single AI.

      In my study of AI I’ve come two two conclusions:

      LLMs are impressive, but they cannot ever be an AGI in and of themselves.
      Humans thought isn’t nearly as complex as we thought it was.

      The first one is simple, the LLMs can do a lot of things that seem like reasoning, and can put up a really good impression of a human, but there is no way for an LLM to dynamically learn. Without feedback there is no possibility of intelligence. There’s a fantastic example of the loss of this feedback mechanism causing a person to behave deterministically in cases of loss of short-term memory. There’s a great episode of Radiolab called “Loops” that has a segment about this.

      The analogy I use is that if LLMs are the wheel, intelligence is a 4-door sedan. You need wheels to have a functional sedan, but no matter how much you improve your wheels, you will never have a sedan pop out of it. Our AI techbro overlords are currently operating as a cargo cult, where they actually believe that they can just keep building better wheels.

      The second comes from watching the various complaints about LLMs as time has gone by. The primary issues of LLMs, flaws in logic, hallucinations, inability to implement any sort of real guardrails that aren’t bypassed simply by tricking the LLM, etc. are all human limitations. Humans are known to hallucinate all the time (and not in the LSD sense). There’s a reason that people are generally known to be terrible witnesses. Similarly, we find instances where people conflate images they’ve seen with memories of things they’ve experienced. It’s not hard to provoke the brain into doing these things.

      So what’s the difference? Human thought is a process that involves multiple independently functioning components with feedback loops. Being good at learning and reasoning isn’t primarily about absorbing knowledge, it’s about developing the ability to identify when your train of thought is wrong. When I had students, early on they would often ask me questions about why something wasn’t working. By the second year, they often would come ask me, but before I could respond, they’d interrupt and say “Oh, could it be this thing? I should go check that.” Its that feedback process that really defines intelligence. (This built-in error-correcting/self-doubting mechanism is also the source of imposter syndrome, IMO).

      So what’s this all mean? The good news is I don’t think achieving AGI is actually all that difficult, it’s just not ever going to happen with the current path we’re pursuing. The bad news is that I would be willing to bet all I own that there is no definition of intelligence which has all three of these traits: a) objective, b) includes all humans, c) excludes all non-sapient AI. Furthermore, any AGI achieved that mimics our thought processes will be vulnerable to all the things humans are vulnerable to, including social engineering (which is basically what almost all jailbreaking techniques are, just way more complex).

      As a final thought, let me share this quick anecdote:

      Once when I was at a national park, the Park Ranger was going over the importance of using the Bear Boxes to store food. They then pointed out that they were on their 5th or 6th iteration of the boxes, because the bears eventually figure out how to open them. To that, someone asked “Why not make it physically impossible for the bears to open?” The ranger’s reply was “Unfortunately, there’s a lot of overlap between the dumbest people and the smartest bears.” The same will hold true of AI as it continues to increase in complexity.

    11. 11.

      Carlo Graziani

      @Goku (aka Amerikan Baka): I have not discussed this. But I do think that (a) Tech Titans and othe C-Suite types would indeed like this very much, and (b) they are not going to get it because “AI” will not be up to the task of replacing human labor.

      Cory Doctorow has been talking up his next book in his podcast. He coined the phrase “Reverse Centaur” for it. A Centaur, in the analogy, is a human head on a computer body, while a Reverse Centaur is, well, a computer head on a human body. The idea is that as labor-saving assistants, AI systems work great, and make Centaurs run faster. As labor replacements they suck, because they screw up constantly and need humans cleaning up behind them. I don’t believe that the “screw up constantly” part is going to change, but I’ll get to that part of the story after the Turkey-Day break.

      The other reason they aren’t going to get what they want is that the entire AI sector has been setting cash on fire at a historically unprecedented rate, and nobody (except NVidia) knows a path to profitability. The Dread Bubble Word is already spreading through the financial press, and lots of investors are wondering whether their money is being well-spent on AI Tech. My bingo card says it’s going to come tumbling down next year, and all those great AI assistants, which require stupendous amounts of compute and energy to work, will largely vanish except for some bespoke, expensive applications. Which is to say, an AI Fuckup employee is actually going to be more expensive than all the humans it was intended to replace, combined.

    12. 12.

      Socolofi

      NERD ALERT! WEEDS! (the bad kind):

      Merrie Ringel Morris (Google Deepmind, Univ of Washington adj faculty) has a pretty good set of criteria that’s reasonably easy for everyoen to parse: arxiv.org/pdf/2311.02462

      Short version:

      You have types of tasks, and ones with Narrow Focus and General Focus. You then have levels, where Level 0 is No AI, Emerging (equal to an Unskilled Human), Competent, Expert, Exceptional, and Superhuman.

      Things like using ChatGPT or Claude for Coding is an example of Emerging in a Narrow Task (coding). Things like nearly every spell checker is an example of Expert in Spell Checking. She argues LLMs are Emerging in General as well; I’d argue with that, although perhaps that’s a middle category between Narrow and General.

      Not in her paper, but another thing to consider is tasks that require accuracy, and how accurate is the answer. For example, legal documents, and in particular legal citations, have a high degree of accuracy required, and to date there have been plenty of issues where said citations turned out to be completely made up (and caught by judges who do read the footnotes).

      IMHO accuracy is the big problem with LLMs and getting anywhere close to AGI at this point. The current strawman, which is really weak, is, “How many Rs are there in strawberry?” The next big problem is memory – or in particular, augmenting a LLM’s long term, slowly changing memory (such as where do Edmonton and Washington play) with short-term facts and reasoning (e.g. Edmonton is trailing Washington 2-4 after 2 periods RIGHT NOW). Oh, and the third is power. Your brain is about 20 calories per hour. A decent LLM with lots of processing and all that is around 700,000 calories per hour (~3000 kilojoules).

    13. 13.

      Matt McIrvin

      In Greg Bear’s 1997 novel “Slant”, there’s an AI- and nanotechnology-powered world of abundance, mass technological unemployment and a huge population on some sort of UBI, and a bunch of elite zillionaires have decided that most of humanity is simply excess biomass and a drag on them, and what they ought to do is huddle in a shelter somewhere and wipe out the rest of humanity (I forget how exactly).

      Years before the current LLM-fueled explosion of AI hype, during the early parts of the Great Recession, I was talking about the whole question of mass technological unemployment with some online friends and one of them seemed skeptical that it was actually ever going to happen, in the sense that most people permanently can’t find jobs at all. He pointed out that 150 years ago most of the human population even in the leading economies worked in agriculture. Most of those agriculture jobs went away, ultimately killed by automation and trade, yet somehow we made up jobs to do. Maybe they’re jobs that David Graeber would call bullshit jobs, they weren’t necessarily better jobs, but jobs they were.

      I think at some point we’re going to have renewed pressure to reduce the workweek. I suspect that a large part of the dramatic move toward outright fascist assholery by our tech elites is pushback to the increasingly effective organized-labor agitation that was starting to happen in the aftermath of the COVID recession, and starting to take hold among populations they didn’t ever expect to be susceptible, like software engineers. If AI is writing all the code, suddenly those expert coders don’t have any leverage at all.

    14. 14.

      Matt McIrvin

      @Xentik: Reminds me of when there was all this angst about how IBM’s Deep Blue could beat Garry Kasparov at chess, and what this meant for the uniqueness of human intelligence, and I remember thinking “the Atari 2600 Video Chess cartridge could beat ME at chess in 1982.”

      I recently heard of someone trying to use some general-purpose LLM as a chess engine, pitting it against Atari 2600 Video Chess, and sure enough, the 2600 whipped its ass…

      Reply
    15. 15.

      RSA

      Carlo,

      Nice post! A few thoughts about further reading:

      Are you familiar with Doug Lenat’s (early) work on automated mathematical theorem proving? His thoughtfulness about the success and failure of his approach is admirable, as represented in the article, “Why AM and EURISKO appear to work” [my emphasis].

      sciencedirect.com/science/article/abs/pii/000437028490016X

      Another bit of historical science you may find interesting is research on so-called insight problem solving, in psychology. Here’s an article that gives a reasonable overview, including mention of influential models of cognition (e.g. Newell and Simon), capable to some extent of what cognitive scientists view as reasoning.

      researchgate.net/profile/Michael-Oellinger/publication/238605884_Psychological_Research_on_Insight_P…

      I’ll also mention that Bayesian updating is intractable in general, but it’s kind of interesting, isn’t it, that it so often works on real-world problems? That may say something about our universe, though the philosophical discussions about that quickly take me out of my depth.

    16. 16.

      Eyeroller

      To perhaps explain cosmological redshifting a little differently, the expansion of the universe means that the length between any two points is increasing with time.  Wavelength is a length, so it too is “stretched” by the expansion of the universe.  A longer wavelength is a redder light.  The longer the light has been traveling, the greater the stretching.  The exact redshift-distance relationship depends on the cosmological model.

      I wrote a comment at the very end of the last entry in this series, in which I suggested that reducing reasoning to language is a category error.  We know from nonhuman animals’ reaoning that it cannot be dependent on language. Their reasoning may be inferior to ours (or in some circumstances it may not be), but it cannot depend on a human-like language.

      It’s also unclear that it’s really discontinous. All we can say is that our conscious perception of it is often discontinuous.

    17. 17.

      Carlo Graziani

      @Marc: I have colleagues who work with large codes, who have successfully incorporated AI tools into their methodology. The key is work on little chunks of code at a time (AI can’t do design), have a test suite, and become really expert at crafting and tuning prompts. Developing their workflow took a lot of time and effort, but it pays dividends in the long run. They’ve become Centaurs, in the Doctorow sense (see Comment #11).

      The keys to getting good work from an AI assistant appear to me to be (a) be a domain expert, so you can validate results, and (b) have some ground truth, so that there are in fact correct and incorrect responses.

      For 99% of AI users today, one or both of these conditions does not apply, so they get chat noise.

    18. 18.

      Ramona

      Carlo, I need to say this back to see if I understand the model you propose here of a rudimentary Artificial Reasoner:
      An Artificial Reasoner is one which can
      1. monitor the Bayesian statistical learning process where evidence is used to update the parameters of various candidate models
      2. Judge when the learning process has stalled
      3. Discern whether the learning process has stalled due to the evidence not being informative about the model
      or whether the model is not explanatory of the evidence.
      4. Change evidence source or model without performing an exhaustive search because it (Artificial Reasoner) has some mechanism for deducing why the nature of current evidence cannot inform the model and what would characterize useful evidence or if model inadequate, replace model but in a way informed by the objective of the reasoning task.

    19. 19.

      Eyeroller

      @Matt McIrvin: I have a coworker who is an expert Go player. There was a big kerfluffle in the Go world a while ago when an AI beat whatever the equivalent of grandmaster was, using deep learning trained on a large corpus of high-level games.

      He recently told me about another program that is even better. This program simply played millions of games against itself and memorized the best paths for different board layouts. This new program was astonishing because it made moves surprising to human players, whereas the older program didn’t do that because–of course-it had been trained on what humans do.

    20. 20.

      Goku (aka Amerikan Baka)

      @Carlo Graziani:

      Thanks, that pretty much answers my question

      @Marc:

      Maybe we’ll even get properly staffed schools and day care ;)

      Maybe we’ll have Star Trek replicators too lol

      @TONYG:

      Or, maybe these assholes are like most other capitalists, and are not thinking beyond the next quarter

      A pretty safe bet

    21. 21.

      Urza

      I don’t know that your explanations are correct.  But it does remind me why some humans are better at reasoning than others.  Lots of people don’t know how to think outside the predefined box and are highly uncomfortable with the idea of stepping outside the lines.  That doesn’t help differentiate AI from human reasoning unfortunately, but I do think most humans given enough motivation can do it while the AI can’t, yet
      Since last year i’ve been thinking the AI just shows how many humans regurgitate what they absorbed without actually thinking.

    22. 22.

      Carlo Graziani

      @Ramona: Yes, that’s a good summary of the thesis.

      The constraints on how the reasoning process may intervene are nontrivial, though. On the evidentiary reform side, how would the artificial reasoner propose a new evidence stream that not only  satisfies the requirement of shining light on obscure aspects of the model, but also respects realistic feasibility constraints. In the case of GRB astronomers, for example, propose to “build instruments with better position resolution” instead of “build a spacecraft to go to GRB sources and find out what’s up” .

      In the case of Model Reform, the constraint is likely more subtle and challenging: propose a new model that satisfies some version of Occam’s Razor. So “How about we include the fixture mounting screw and the power cable in the model”, check. “How about we include the effect of solar storms, and saboteur gremlins, and the power company observing you and screwing around with you” not so much.

    23. 23.

      YY_Sima Qian

      @Xentik: I read an account on X of a heavy AI user trying out the new Gemini 3 Pro. He initially forgot to turn on the web search function, so when asked the LLM for information related to 2025, the model insisted that it was still late 2024 (which was apparently the cut off date of the model’s training data). No matter what the user fed into the chatbot (photos, news articles, etc.), it developed ever more elaborate theories on how all of the evidence were fake, expressed ever more eloquently. When the user turned on web search, the model updated acknowledged that it was indeed 2025. Inevitably, there were comments that said the experience is eerily reminiscent of arguing against some of the actual people on social media, only w/ people one cannot easily change their priors by turning on “web search”.

      By all accounts Gemini 3 Pro is a powerful model that includes a lot of obscure knowledge in the training data (probably > 5T parameters). However, there was no improvement in hallucination rate, & the “reasoning” felt the same as Gemini 2.5 Pro. Thus, same model, w/ improvement coming from brute forcing more data, more parameters & more compute. & Google seems to be making more progress than OpenAI, Anthropic or most certainly Meta.

    25. 25.

      Ramona

      I’m philosophically materialist too and so I see our consciousness arising as a function of biological matter.

      I see us in our electronic age trying to replicate all manner of phenomena using electronics as not very different from our forebears in awe of the mechanical age making all manner of  clever clockwork animals.

      But I suspect that the jump in complexity between electronics and clockwork is minuscule compared to the complexity between electronics and the biological machinery of even a single cell.

      Consider that in developing into an embryo, identical stem cells differentiate into skin, nerve and so on cells based on where they lie in the geometry of the blastula and how far they are from the site of division.

      We barely understand nor can we replicate these processes and we aren’t even talking about intelligence yet.

      I might be very wrong but I don’t think we can replicate artificial intelligence until we develop a finer grained understanding  of how biochemical processes unfold.

    26. 26.

      Carlo Graziani

      @RSA: Those abstracts look very interesting. I’ll look them over during the break, they will likely broaden my horizons.

      I know what you mean by calling Bayesian updating “intractable”, but nowadays, formal intractability does not mean what it used to mean. There are quite a few fast, accurate approximations (such as MCMC) that have changed the statistical outlook on “tractability”.

      In any event, here I am using Bayesian updating more as a guiding framework than as a computational recipe. LLMs and other DL methods are, in a sense, quite frequentist in their internal design. What I mean by that is that while they must necessarily model data distributions, they do so in a very implicit manner, burying the representation of the distribution in computational clutter such as the weights, biases, hyperparameters, and architectural choices. In this way, they resemble frequentist statistics (functions of the input data), except that unlike principled frequentist methods, there is no hope of computing a statistical distribution of the function output from first principles.

      That’s one reason why I like to draw a veil over the architectures and treat them as Bayesian black boxes. This outlook can help indicate interesting experiments to perform on said black boxes. And perhaps, it can indicate ways in which “reasoning” processes might supervise them, without dictating their internal operations.

    27. 27.

      YY_Sima Qian

      Somewhat op topic, another good read on the Sino-US “AI” competition (from a veteran of the tech. war from the Biden Administration):

      The Bitter Lessons
      Thoughts on US-China Competition

      DEAN W. BALL

      NOV 14, 2025

      The United States and China are often said to be in a “race” with one another with respect to artificial intelligence. In a sense this is true, but the metaphor manages to miss almost all that is interesting about US-China dynamics in emerging technology. Today I’d like to offer some brief thoughts about how I see this “race” and where it might be headed.

      All metaphors are lossy approximations of reality. But “race” is an especially inapt metaphor for this context. A race is a competition with clear boundaries and a clearly defined finish line. There are no such luxuries to be found here. Beyond the rhyme, “the Space Race” made intuitive sense because the objective was clear: landing humans on the Moon.

      Stating that there is an “AI race” underway invites the obvious follow-up question: the AI race to where? And no one—not you, not me, not OpenAI, not the U.S. government, and not the Chinese government—knows where we are headed.

      The U.S. and China are more like ships on the open seas, voyaging toward some unknown, only dimly imagined destination. Perhaps we think it is India we will find, though more likely it is a new continent altogether. We do not know that we are headed in the right direction, though neither are we stabbing entirely in the dark. And we both have the intuition that it is probably to beneficial to “arrive” (my metaphor is breaking down) before the other. That intuition is likely correct. It would be more accurate to describe this state of affairs as an “unbounded, multi-dimensional, technological, scientific, and economic competition.”

      The U.S. economy is increasingly a highly leveraged bet on deep learning. This has been true for a couple years now, though it is more explicit and extreme today than it was two years ago. Most of this is because of decisions made by private actors (AI companies, hyperscalers, banks and other large sources of capital, etc.), but on the margin the policy and posture of the Trump Administration has heightened this dynamic as well.

      Another way of putting this is that America is “bitter-lesson pilled.” Our strategy rests on the presumption that advanced AI is possible in the near-term and hugely consequential and that compute is the high-order bit to advancing AI (as opposed to data, scaffolding, clever architectures, and the like). This is not so much the government’s strategy (though at least in the Biden Administration it is true that the senior AI policy planners mostly believed this) as it is the strategy of the leading AI companies and hyperscalers. As such we have pivoted with an alacrity that has been lacking recently in the West.

      China, on the other hand, does not strike me as especially “AGI-pilled,” and certainly not “bitter-lesson-pilled”—at least not yet. There are undoubtedly some elements of their government and AI firms that prefer the strategy I’ve laid out above, but their thinking has not won the day. Instead China’s AI strategy is based, it seems to me, on a few pillars:

      Embodied AI—robotics, advanced sensors, drones, self-driving cars, and a Cambrian explosion of other AI-enabled hardware;
      Fast-following in AI, especially with open-source models that blunt the impact of U.S. export controls (because inference can be done by anyone in the world if the models are desirable) while eroding the profit margins of U.S. AI firms;
      Adoption of AI in the here and now—building scaffolding, data pipelines, and other tweaks to make models work in businesses, and especially factories.

      I find it intriguing that both countries seem to have converged on the strategies that best suit their respective strengths. Advanced AI is, at its core, software-as-a-service delivered through high-end semiconductors, cloud computing platforms, charismatic user interfaces, and enabled by clever financial and legal engineering. Every one of those things is America’s civilizational bread and butter. Embodied AI is, at its core, enabled by mass manufacturing excellence, thick trade networks, and other characteristics that fundamentally tilt in China’s advantage.

      The U.S. and China may well end up racing toward the same thing—“AGI,” “advanced AI,” whatever you prefer to call it. That would require China to become “AGI-pilled,” or at least sufficiently threatened by frontier AI that they realize its strategic significance in a way that they currently do not appear to. If that happens, the world will be a much more dangerous place than it is today. It is therefore probably unhelpful for prominent Americans to say things like “our plan is to build AGI to gain a decisive military and economic advantage over the rest of the world and use that advantage to create a new world order permanently led by the U.S.” Understandably, this tends to scare people, and it is also, by the way, a plan riddled with contestable presumptions (all due respect to Dario and Leopold).

      The sad reality is that the current strategies of China and the U.S. are complementary. There was a time when it was possible to believe we could each pursue our strengths, enrich our respective economies, and grow together. Alas, such harmony now appears impossible. We are locked into a structural conflict, and tempting as it may be to look away, we must accept this bitter lesson, too.

      I find it frustrating that these former Biden Administration officials speak [more] candidly now, but advocated for the same policies, w/ the same underlying assumptions, as the Trump 47 Administration, albeit slightly less extreme & w/ more self-perceived guardrails (that have proved brittle). They were just as “AGI-pilled”, just as “AI race-pilled”, just as “tech. war-pilled”, & few of them own up to their choices & policy advocacy. This is not just related to “AI” & tech. war, BTW, but other policy areas such as the economy, Ukraine & Gaza.

      I also had to chuckle at the last paragraph, since Ball was among the architects of the dramatic escalation under Biden of the tech. war started by Trump 45, which significantly contributed to making Sino-US collaboration on “AI”, technology & increasing science, more & more difficult.

    28. 28.

      Geminid

      @Matt McIrvin: I have a chess playing friend who follows this stuff, and I asked him about chess engines so I could fake a knowledgeable comment for this thread.

      My friend said IBM’s Deep Blue lost a five-game match to Gary Kasparov in 1996, and then won a five-game match the following year.

      Chess engines have far outstripped human players since then. The current leader is “Stockfish” with a rating of 3600. Magnes Carlson, the highest rated human player, topped out at 2882 and the current human leader’s rating is 2834.

      My friend has a rating of around 1550. He’ll compete in an Atlanta tournament this weekend, in the C class I think. He had high hopes of winning some prize money in his last tournament. But his first game, some 11 year-old kid in a dinosaur sweatshirt beat him! My friend ended up 2 wins, 3 losses.

    29. 29.

      Urza

      I see I was quoted from another thread, appreciate that.  To be clear, I can’t tell where the modern chaos will lead, but it is pretty easy to figure out what the people at the top seem to be thinking by their public thoughts and deeds plus the leaked bits.  There’s absolutely no thoughts being given on what to do with all the workers or what the new jobs will be.  Which means they just want to take everything they can get away with, have their robot army for defense, and surveillance of EVERYTHING which will not include them.  Wealth will become the only way to privacy if we keep on this path, and wealth will only happen if you agree with the power structure.

      Personally i’m thinking we should ditch most public surveillance but have video and audio on the masters of the universe 24/7/365 because who can harm the most people, average person on the street or the trillionaire.

    30. 30.

      altofront

      Thank you so much for writing this series.  I feel like I’m finally understanding something about how LLMs actually work (and I’m gratified that it matches my deep skepticism about them).

      Your description of the “aha!” moment reminded me strongly of Aristotle’s description of how our minds intuit universals from sequences of particulars (from the Posterior Analytics): “So out of sense-perception comes to be what we call memory, and out of frequently repeated memories of the same thing develops experience; for a number of memories constitute a single experience. From experience again–i.e. from the universal now stabilized in its entirety within the soul, the one beside the many which is a single identity within them all–originate the skill of the craftsman and the knowledge of the man of science, skill in the sphere of coming to be and science in the sphere of being. We conclude that these states of knowledge are neither innate in a determinate form, nor developed from other higher states of knowledge, but from sense-perception. It is like a rout in battle stopped by first one man making a stand and then another, until the original formation has been restored. The soul is so constituted as to be capable of this process.”

    31. 31.

      YY_Sima Qian

      @Urza: I think a lot of human labor can be replaced by “AI” & robotics, just as automation has been replacing human labor for centuries. That is simply evidence of how mind numbingly repetitive a lot of jobs are, & not evidence of “AI” & robots becoming intelligent.

    34. 34.

      RSA

      @Ramona:  I might be very wrong but I don’t think we can replicate artificial intelligence until we develop a finer grained understanding  of how biochemical processes unfold.

      This is a respectable position–some even hold that we can’t replicate intelligence without a better understanding of quantum physics (e.g. Roger Penrose).

      For contrast, there’s Allen Newell, who argued for what he called the knowledge level: some kinds of intelligence (far from all, it turns out) plausibly ground out in manipulation of symbolic structures. A lot of reasoning in logic, mathematics, law, and other specialized domains seems to work out this way.

    36. 36.

      RSA

      @Carlo Graziani: Thanks for your explanation; that makes sense.

      I’ve always liked insight reasoning problems. You can’t ask them of LLMs, because of data leakage–the models have been trained on them. But here are a couple of examples used in psychology experiments, from the first half of the 1900s:

      You are in a room with a table pushed up against the wall. On the table there is a candle, a box of thumbtacks, and a book of matches. The challenge is to affix the lit candle to the wall so that it will not drip wax onto the table below.

      Two cords are hung from the ceiling, of such length that they reach the floor. One hangs near a wall, the other from the center of the room. Your problem is to tie the ends of these two cords together. If you hold either cord in your hand, you can not reach the other. You can use or do anything you wish. A number of objects which might help in the solution of the problem are present in the room: furniture, a pole, other cords, small objects. Devise as many solutions as you can.

    37. 37.

      Mr. Bemused Senior

      @Ramona: I’m philosophically materialist too and so I see our consciousness arising as a function of biological matter.

      I agree. We humans are composed of the same atoms as everything else, so in principle a sentient electronic being is possible.

      @Ramona: But I suspect that the jump in complexity between electronics and clockwork is minuscule compared to the complexity between electronics and the biological machinery of even a single cell.

      Thank you for making this point. My background is software. I happen to have been exposed to bioscience in some detail, having had a gig with a gene sequencing machine manufacturer.

      It’s true we understand some details about neurons. There are electrical signals as well as chemical neurotransmitters. But how neurons actually function and how information is stored is far from clear. For one thing, each individual cell is alive and interacts with its environment in ways that probably affect learning. Also, at the molecular level, things are incredibly complex and our understanding of the processes involved barely scratches the surface.

      At the macro level, the human brain is divided into two hemispheres and they operate in totally different domains. The left hemisphere is a sequential processor. It contemplates the past and the future. Language is centered there. The right hemisphere is a parallel processor and is concerned with now. The right hemisphere operates without language as we understand it. A very good introduction to this distinction is Jill Bolte Taylor’s My Stroke of Insight. Her TED talk is great.

    38. 38.

      glc

      Just a general remark …

      LLMs are supposed to be a shortcut – throw away the world and keep the language.  This is Shannon, 1948, but it takes a lot of power and compression to implement. One can do quite a lot that way but there are obvious drawbacks and limitations. Having no notion of truth is ultimately a problem, and a very visible one. On the other hand, language is entirely sufficient for verifying formal reasoning and that angle is well established now, and useful.

      Another approach keeps the world. This idea is older than computers – notably associated with Tarski a century ago. The formal systems become useful when they are interpreted. And one doesn’t need that much of a world to give this approach some heft.   Knowing something about what the words mean is helpful.

    39. 39.

      Carlo Graziani

      @Urza: What I wrote at Comment #11 expresses that while I agree with you that corporate management would love to replace labor with AI, I do not believe that they are going to get their wish.

      I think that  @Matt McIrvin made a very perceptive comment. When new technologies augment human productivity, then, historically, lost labor opportunities are replaced by other opportunities that compensate the new laborers with some part of the wealth newly-created by increased productivity. The process is disruptive, especially if not cushioned by government-backed social welfare, but in the end, the following generation lives with a share of the new wealth.

      I am not sure that this story applies as well to AI as it did to, say, the internal combustion engine, because I am not clear about what the lasting productivity benefits of AI will be. At the moment, essentially all AI use cases in the wild are subsidized by venture capital to a ridiculous extent, and it is not at all clear whether there exists any mass market for AI if those subsidies are withdrawn by disappointed investors. This moment may wind up looking in retrospect like another tulip-mania, instead of the technological revolution many see in progress.

    40. 40.

      Marc

      @glc: Another approach keeps the world. This idea is older than computers – notably associated with Tarski a century ago. The formal systems become useful when they are interpreted. And one doesn’t need that much of a world to give this approach some heft. Knowing something about what the words mean is helpful.

      I’ve worked on and off with people working on strongly-typed languages (like Ada) and formal system verification.  My one conclusion has been that mere humans have a tough time working with formal languages. LLMs will eventually come into play as a means of transforming human friendly language into verifiable specifications.​

    41. 41.

      Urza

      @YY_Sima Qian: Don’t disagree with you, but also wouldn’t repetitive labor damage the models if it started learning from that?  And also most humans are just not capable of doing more.  And working in tech i’m not speaking to the minimum wage labor.  Lots of people refuse to go outside the lines where I am, its literally how I carved out a position finding whats safe to do by ignoring the rules set in 2010 that no longer really apply the same way. I have alot of days saying do this and they’re like no its not safe even though I do it practically daily and with increased scrutiny on manual actions no one has cared because its needed to get the job done.

    42. 42.

      Matt McIrvin

      @Carlo Graziani: I’ll credit Carlos Yu (a brilliant guy generally, an autodidact in many fields who somehow ended up not being a total crank) with giving me that insight to the extent that I had it.

    43. 43.

      Marc

      @Carlo Graziani:  At the moment, essentially all AI use cases in the wild are subsidized by venture capital to a ridiculous extent, and it is not at all clear whether there exists any mass market for AI if those subsidies are withdrawn by disappointed investors.

      I ignore the big company products except for open source variations. I assume, like many times in the past, the real action is happening at places like HuggingFace and Civitai (not necessarily safe for work).  These companies are small, barely if at all profitable, but they host a lot of companies/groups getting close to viable use cases (like porn, strangely enough).​

    45. 45.

      Carlo Graziani

      @Marc: The trouble is that model code is cheap. Data is (slightly less) cheap. Trained models are unbelievably expensive, because the training requires whole-of-datacenter efforts to digest and re-digest vast data corpuses hundreds of thousands of times, to steer billions of LLM model parameters to reasonable values.

      Huggingface is a great resource. But without the tech industry driving pre-training, it would wither. And I have a feeling that this is where the world is going.

    46. 46.

      Marc

      @Carlo Graziani: I think the big tech industry is working very hard to create the impression that they are irreplaceable.  I don’t think they are. People are training their own models using mixtures of public domain and synthetic data.  There are a lot of niche applications that will not otherwise be addressed.  In my opinion, given that the hardware needed to pre-train (not just fine-tune) a 30B+ parameter model in a matter of weeks is down to the $2000 to $3000 range, there will be a lot of activity in this area.

    47. 47.

      Marc

      I look at the AI industry this way. Almost everyone has heard of IBM, but only because they got lucky and pivoted at the right moment, now they are a mere shadow of their former importance.  Can anyone under 60 name any of the other large (10K+ employee) computer companies that existed when Bill Gates and the two Steves shipped their first products?   Nvidia, Meta, OpenAI, Google, will be gone some day, maybe sooner than expected.

    48. 48.

      Carlo Graziani

      @Marc: That is the price for a single A100 node. The SOTA models that underpin paying subscriptions to AI services are trained in datacenters housing thousands of such nodes, in jobs running for weeks or months.

      The kind of transformer model that you can pre-train on a small cluster of such nodes is not scientifically uninteresting, but hardly of any commercial value.

      I have some private hopes for alternatives to transformers that don’t have such stupid scaling laws. But so long as we are stuck with such scaling, we are at the mercy of hyperscalers. There is no real opportunity for an AI cottage industry on our current technological path, in my opinion.

    49. 49.

      Marc

      @Carlo Graziani: I’m talking about 128 GB AMD Ryzen AI+ Max boxes like this one or, pay $1000 more and get the Nvidia DGX Spark, both are about the size of your hand.  AMD has also dropped 32 GB VRAM dual GPU cards for $1300. Not quite A100s, it matters not much.  I also have a ridiculous 2″x3″ SoC with 32 GB memory and an NPU that is essentially covered by a heat sink and fan and can run PyTorch-based inferencing at ~5 TOPs.  I’m betting on the big AI companies going down with their data centers.

    50. 50.

      Marc

      @Carlo Graziani: The kind of transformer model that you can pre-train on a small cluster of such nodes is not scientifically uninteresting, but hardly of any commercial value.

      Commercial value is a funny thing.  You can try for a few potential trillion dollar scores, or make a lot of smaller bets on interesting solutions for day to day problems.  I think the latter are almost always more future proof. I’ve always worked on those smaller problems.

    51. 51.

      YY_Sima Qian

      The US & the PRC are still joined at the hip in this “AI Race”, unwelcome by both governments (gift link to NYT article below):

      In the A.I. Race, Chinese Talent Still Drives American Research
      Although some Silicon Valley executives paint China as the enemy, Chinese brains continue to play a major role in U.S. research.

      By Cade Metz and Eli Tan

      Reporting from San Francisco

      Nov. 19, 2025

      Surprised that the NYT did not reference Marco Polo’s Global AI Talent Tracker.

    54. 54.

      Ramona

      @Carlo Graziani: Thanks! I see your point about how the constraints on the method of intervention determines whether our Artificial Reasoner is indeed a reasoner and not a “throw shit against the wall and let’s see what sticks” analog to the hallucinating LLMs much celebrated by the tech titans.

    55. 55.

      WTFGhost

      Let me try to add a bit of information folks might (or might not) find interesting. Bayes informs a lot of probability and statistics, and the simplest example is the one folks learn early on, “Bayes Rule.”

      It’s pretty simple: it says, basically, the odds of X, given factor Y, is the odds of X intersected with Y (so, both happened) divided by the probability of Y.

      The probability that both coins are heads, given that at least one coin is heads, is one in three.

      The probability of Y is 3/4ths, right? There’s HH, HT, TH, TT, 3, of 4, have one or more Hs.

      The probability of both-heads, and, at least one is heads, is 1/4th. The only pair that matches is HH, and that only occurs one time in four.

      So (1/4)/(3/4) = 1/3. Of course, we could solve this simple one by inspection – there are three possibilities that are both heads, only one matches both criteria. Bayes’ glory is that he proved that intuitive idea works, even when things are a lot more complicated than just two possibilities. The idea is, Bayes investigates what you know about one piece of data, given another piece of data. “What are the odds you’ll get Covid-19, assuming (“given that”) you’ve been vaccinated in the past 6 months?” is an obvious example of Bayes Rule.

      But part of Bayes whole philosophy was *never* assume you know the odds you’ll get Covid-19, given vaccination within the past 6 months. You can make inferences from a dataset, i.e., one moment in time. But there’s nothing that says new data will follow the same pattern, so as new people get sick, you’re learning more about what the probability actually is of getting sick within 6months of vaccination.

      More simply, Bayes’ philosophy is, you’re constantly checking your model. You don’t flip a coin a thousand times to see if it’s a “fair” coin, 50/50. You flip a coin a thousand times to find a good guess of the ratio of heads to tails, and the more flips you record, the better a sense you have of that coin’s probability.

      So, Bayes says more data = better model = closer to “reality,” which might be a moving target (like Covid-19 infection rates).

    56. 56.

      Sister Machine Gun of Quiet Harmony

      @Carlo Graziani: I’m not sure those AI assistants will vanish. Once a model is trained, it’s not that expensive to run. It’s the model training that eats so many resources,  as does all that data storage. You can run an open source LLM entirely on your laptop and get some decent results for certain tasks.

    60. 60.

      the_mjl

      @Marc: This is a crucial point: A human brain runs on 12-25 Watts (the ‘dim light bulb’ jokes write themselves), and weighs less than 4 pounds. And (IMO) “language” is one of the least interesting things about it. We already have the counterexample we need.

      One big (cultural?) difference between the last AI go-round and this one: Last time tried to be a hardware revolution. We actually wanted to BUILD NEURONS on silicon. Because of the obvious architectural advantages (even if we didn’t understand them). Size. Weight. Power. Stuff you can actually field.

      This time around has been CS dominated, so the general attitude has often been “let’s throw thousands of GPUs at the problem of SIMULATING neuron clusters!”.

      I mean, sure… throw thousands of GPUs and billions of Watts at it, you’re bound to get SOME interesting results. But it won’t scale, and the thermo-economics of it can never work. But GPUs are what they have. So that’s what they use.

      But, as I’ve said before, “how many GPUs can you fit inside a dragonfly’s head?”

      BTW I do expect the attitude to turn back to the hardware side (in fact I see signs of it now– people are citing the papers from the 1990s again, at least according to ResearchGate etc). I also expect that to happen in… China.

      Because that’s where the cites are coming from. ¯\_(ツ)_/¯

