• Menu
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Before Header

  • Comment
  • About Us
  • Lexicon
  • Contact Us
  • Our Store
  • ↑
  • ↓
  • ←
  • →

Balloon Juice

Come for the politics, stay for the snark.

It’s always darkest before the other shoe drops.

Speaking of republicans, is there a way for a political party to declare intellectual bankruptcy?

If you tweet it in all caps, that makes it true!

Bark louder, little dog.

Dear Washington Post, you are the darkness now.

JFC, are there no editors left at that goddamn rag?

If you cannot answer whether trump lost the 2020 election, you are unfit for office.

Books are my comfort food!

I swear, each month of 2025 will have its own history degree.

You come for women, you’re gonna get your ass kicked.

So fucking stupid, and still doing a tremendous amount of damage.

Wow, you are pre-disappointed. How surprising.

Those who are easily outraged are easily manipulated.

A democracy can’t function when people can’t distinguish facts from lies.

The revolution will be supervised.

You’re just a puppy masquerading as an old coot.

There are some who say that there are too many strawmen arguments on this blog.

Let’s bury these fuckers at the polls 2 years from now.

Humiliatingly small and eclipsed by the derision of millions.

… pundit janitors mopping up after the gop

Bad people in a position to do bad things will do bad things because they are bad people. End of story.

I am pretty sure these ‘journalists’ were not always such a bootlicking sycophants.

If a good thing happens for a bad reason, it’s still a good thing.

This country desperately needs a functioning fourth estate.

Mobile Menu

  • 2026 Targeted Political Fundraising
  • Donate with Venmo, Zelle & PayPal
  • Site Feedback
  • War in Ukraine
  • Submit Photos to On the Road
  • Politics
  • On The Road
  • Open Threads
  • Topics
  • Authors
  • About Us
  • Contact Us
  • Lexicon
  • Our Store
  • Politics
  • Open Threads
  • 2025 Activism
  • Garden Chats
  • On The Road
  • Targeted Fundraising!
You are here: Home / Open Threads / Respite: At Play in the Fields of the LLMs

Respite: At Play in the Fields of the LLMs

by Tom Levenson|  March 1, 20266:17 pm| 109 Comments

This post is in: Open Threads, Respite, Technology, General Stupidity

FacebookTweetEmail

ETA: This was written for Balloon Juice but is crossposted at Inverse Square.

So…

Just this week I learned that through my home institution I have access to a suite of LLMs, including all the usual suspects (Anthropic’s Claude, OpenAI’s ChapGPT, Meta’s Llama, and Google’s Gemini. MIT’s come up with a very nice interface to interact with all this artificial talent, and I’ve spent a couple of hours this weekend taking some of them out for a spin. (I’m avoiding OpenAI’s offerings both as a trivial protest and because I don’t trust anything about that company.

So far I’ve enjoyed Claude the most (the Haiku 4.5 model, if you’re wondering). Gemini is interesting, if a bit finicky. But what made me howl was the encounter I just had with Llama, Mr. Zuckerberg’s contribution to the genre. Context: over the holidays I  had a conversation with a senior person at another magnificent 7 tech firm who’d just seen a colleague leave to go to Meta who was utterly dismissive of the company and relieved to be rid of anyone dumb enough (in his view) to basically end his career in top-tier tech by grabbing Facebook bucks. Spoiler: after what you’ll read below, I can see where my friend was coming from.

So here’s the setup. I’ve been asking the various models what I hope are zero-consequence questions, queries in which no one could possibly get hurt if the LLMs wing their way to utter bollocks. A typical ask: map out the logical structure Einstein used in his 1905 light quantum behavior. That one tended to get an initial poor answer based on the idea that the paper centers on the then-pressing mystery of the photoelectric effect–a common mistake for people as well as machines. Pressing the models led Claude in particular to a much more sophisticated account of the paper drawing attention to the way Einstein used arguments from thermodynamics to propose the necessity of understanding light as discrete packets of energy.

Respite: At Play in t

The paper was about this new “heuristic” [Einstein’s term] view of light, and the photoelectric effect appears only as one experimental support for that view. (Much more here.)

After three or four trips through the query engineering needed to get a useful result I’d found my way down the list of models at my disposal to Meta’s Llama.  I was bored with light quanta and so asked a different, much simpler question:

“When a stock is overvalued as defined by a price to earnings ratio a standard deviation or more higher than historical market norms for that company’s sector, what are the typical events or analyses that drive that share price back down to the norm?”

Llama gave me a very stock answer that would have been at home in a high school economics class, so I decided to become a little more pointed in my pursuit:

Given that analysis, how would you explain Tesla’s long run of share prices between one and two orders of magnitude over that of other and much larger and more profitable car companies, like Toyota?

What followed was…first unimpressive and then hilarious. See for yourselves below the jump:

Tesla’s Valuation Puzzle

Tesla’s share price has been a subject of fascination and debate, with its valuation often being significantly higher than that of other automakers, including Toyota. To understand this phenomenon, let’s examine the factors that have contributed to Tesla’s premium valuation.

Growth and Disruption

  1. Pioneering electric vehicles (EVs): Tesla was one of the first companies to successfully mass-produce EVs, creating a new market and disrupting the traditional automotive industry.
  2. Innovative technology: Tesla has been at the forefront of EV technology, with advancements in battery management, autonomous driving, and software updates.
  3. Brand loyalty and cult following: Tesla has built a strong brand identity, with a loyal customer base and a reputation for innovation and sustainability.

Investor Sentiment and Expectations

  1. Growth expectations: Investors have high expectations for Tesla’s future growth, driven by its expanding product lineup, increasing production capacity, and potential for further market disruption.
  2. Long-term vision: Tesla’s mission to accelerate the world’s transition to sustainable energy has resonated with investors, who are willing to look beyond short-term profitability.
  3. Charismatic leadership: Elon Musk’s leadership and vision have been instrumental in shaping Tesla’s brand and driving investor enthusiasm.

Valuation Metrics and Comparisons

  1. Price-to-earnings (P/E) ratio: Tesla’s P/E ratio has often been significantly higher than that of traditional automakers, reflecting its growth prospects and investor expectations.
  2. Price-to-sales (P/S) ratio: Tesla’s P/S ratio has also been elevated, indicating that investors are willing to pay a premium for its revenue growth.
  3. Comparison to other growth stocks: Tesla’s valuation has been compared to that of other growth stocks, such as Amazon or Shopify, which have also experienced significant premium valuations.

Challenges and Risks

  1. Competition from established automakers: Traditional automakers, including Toyota, have been investing heavily in EV technology, potentially eroding Tesla’s competitive advantage.
  2. Production and delivery challenges: Tesla has faced production and delivery issues in the past, which can impact its ability to meet growth expectations.
  3. Regulatory risks: Changes in government policies or regulations can affect Tesla’s business, particularly if they impact the demand for EVs or the company’s access to incentives.

Why Tesla’s Valuation Has Remained High

Despite the challenges and risks, Tesla’s valuation has remained high due to a combination of factors:

  1. Continued growth and innovation: Tesla has continued to innovate and expand its product lineup, maintaining its position as a leader in the EV market.
  2. Investor enthusiasm and momentum: The company’s strong brand and charismatic leadership have maintained investor enthusiasm, driving the stock price higher.
  3. Speculative buying and short squeezes: Tesla’s stock has been subject to speculative buying and short squeezes, which can contribute to its high valuation.

Comparison to Toyota

Toyota, on the other hand, is a more established automaker with a larger market capitalization and a more diversified product lineup. While Toyota has been investing in EV technology, its valuation has been more in line with traditional automakers. The key differences between Tesla and Toyota are:

  1. Growth prospects: Tesla is expected to continue growing at a faster rate than Toyota, driven by its focus on EVs and innovative technology.
  2. Brand identity and disruption: Tesla has created a new market and disrupted the traditional automotive industry, whereas Toyota is a more established player with a different business model.

In conclusion, Tesla’s high valuation relative to Toyota and other traditional automakers can be attributed to a combination of factors, including its growth prospects, innovative technology, brand loyalty, and investor enthusiasm. While there are challenges and risks associated with Tesla’s business, its continued innovation and growth have maintained its premium valuation.

I saw a few problems with that conclusion, so I asked again:

I’d like you to follow up on that analysis. You say Tesla has an expanding product lineup; it has just discontinued its two largest models (Model S and Model X). The cybertruck has failed to become a mass market vehicle. And global sales are declining while Tesla has been overtaken by BYD as the world’s largest EV maker. Its innovation process has slowed, and it is arguably behind Waymo and others in autonomous driving. Its leader is a controversial figure, whose presence energizes one community but since 2024 at least has alienated another, arguably of those more likely to buy an EV. Are you sure that these are the factors that explain Tesla’s share price? Or to put it another way, would you invest Meta’s money in Tesla at the current P/E ratio?

What followed is, it seems to me, a perfect anecdata demonstration of why we might not want to surrender all of our critical faculties to a set of silicon overlords. Map what Llama says onto what I asked:

Revisiting Tesla’s Valuation

Upon further analysis, it appears that the factors I previously mentioned may not be as relevant as they once were. Tesla’s product lineup has indeed been contracting, with the discontinuation of its two largest models, the Model S and Model X. The Cybertruck, while innovative, has not achieved mass market success. Additionally, Tesla’s global sales have been declining, and the company has been overtaken by BYD as the world’s largest EV maker.

Challenges and Concerns

  1. Declining sales and market share: Tesla’s sales have been declining, and the company has lost its position as the world’s largest EV maker.
  2. Innovation slowdown: Tesla’s innovation process has slowed, particularly in the area of autonomous driving, where it is being surpassed by competitors like Waymo.
  3. Controversy surrounding Elon Musk: Elon Musk’s leadership and behavior have become increasingly polarizing, potentially alienating a significant portion of potential customers.

Reevaluating Tesla’s Valuation

In light of these challenges and concerns, it’s reasonable to question whether Tesla’s current valuation is justified. The company’s P/E ratio is still relatively high, indicating that investors are willing to pay a premium for its shares.

Investment Decision

If I were to advise Meta on investing in Tesla at the current P/E ratio, I would likely recommend caution. While Tesla is still a pioneering company in the EV space, its current challenges and concerns suggest that its growth prospects may be more limited than previously thought.

Alternative Investment Opportunities

There may be other investment opportunities in the EV or technology spaces that offer more attractive growth prospects and valuations. For example, companies like BYD, NIO, or Li Auto may offer more compelling investment opportunities due to their growth potential and competitive positioning.

Conclusion

In conclusion, while Tesla is still a significant player in the EV market, its current challenges and concerns suggest that its valuation may be due for a correction. As an investor, it’s essential to carefully evaluate the company’s prospects and consider alternative investment opportunities before making a decision. At the current P/E ratio, I would likely advise against investing in Tesla, and instead, explore other opportunities that offer more attractive growth prospects and valuations.

That’s a 180 degree swing from the prior answer, generated by nothing more than calling bullshit on the model’s first conclusion.

What does it all mean?

Hell if I know. I’m as far from an expert on all of this stuff as it’s possible to be. I’m just a guy with access to a very funny set of hammers out here looking for nails.

One thought though: there’s a lot of evidence out there, some of which I’ve encountered directly from colleagues and friends, in which various AI approaches have real and large impacts on various problems. Folks doing drug discovery or protein analysis or a lot of coding tasks and…well the list goes on. And in just my brief playtime with Claude et al., I got some distinct value from the LLMs’ ability to look at specific texts and break them down. So anyone saying AI is a fraud top to bottom is, I think, clearly wrong.

But those who think that a general machine intelligence is right around the corner…well, the Llama case above seems to me a pretty clear example of how a statistical inference machine operating on a bucket full of internet slop may not be all that, not yet and quite possibly not for an arbitrarily long time to come.

Image: J. M. W. Turner, Keelman hauling coals by moonlight, 1835

FacebookTweetEmail
Previous Post: «Interested in a Zoom Book Club to Discuss Tom Levenson’s Book 'So Very Small'? Interested in a Zoom Book Club to Discuss Tom Levenson’s Book ‘So Very Small’?
Next Post: Medium Cool – Classics! Medium Cool – Books That Shaped Your Views of Race»

Reader Interactions

  • Commenters
  • Filtered
  • Settings

Commenters

No commenters available.

  • Another Scott
  • Baud
  • bbleh
  • Bupalos
  • Carlo Graziani
  • Castor Canadensis
  • Chetan R Murthy
  • counterfactual
  • different-church-lady
  • Eolirin
  • Eyeroller
  • Fair Economist
  • Glidwrith
  • Goku (aka Amerikan Baka)
  • Jacel
  • jimmiraybob
  • JoyceH
  • lowtechcyclist
  • m.j.
  • m0nty
  • Martin
  • Matthew
  • Mr. Bemused Senior
  • no body no name
  • NotoriousJRT
  • Ohio Mom
  • Old School
  • prostratedragon
  • Quantum man
  • Ramona
  • Rusty
  • Timill
  • Tom Levenson
  • TONYG
  • Trivia Man
  • Urza
  • WaterGirl
  • What Have the Romans Ever Done for Us?
  • WTFGhost
  • YY_Sima Qian

Filtered Commenters

No filtered commenters available.

    Settings




    Settings are saved immediately; press X to close the box.

    109Comments

    1. 1.

      Tom Levenson

      March 1, 2026 at 6:19 pm

      No idea why those two lines in the middle of the long quote blocks became tiny. First attempts at fixing it failed. Any thoughts?

      Reply
    2. 2.

      Baud

      March 1, 2026 at 6:23 pm

      I don’t have that much experience with AI, but my impression is asking general public LLMs for analysis is really just going to get you a probability based response. AI’s value, if any, is in handling discrete tasks that can be programmed with care.

      Reply
    3. 3.

      Baud

      March 1, 2026 at 6:24 pm

      @Tom Levenson:

      Ask AI.

      Reply
    4. 4.

      Tom Levenson

      March 1, 2026 at 6:27 pm

      @Baud: I just did. I smell lawyering in the background:

      TL:
      How much should I trust your judgment?

      Llama:
      Evaluating My Judgment
      As a conversational AI, I’m designed to provide helpful and informative responses. However, my judgment is limited by my training data, algorithms, and the context of our conversation.
      Factors to Consider

      1. Lack of personal experience: I don’t have personal experiences or emotions, which can limit my ability to fully understand the nuances of a situation.
      2. Training data limitations: My training data is sourced from a vast corpus of text, but it’s not exhaustive, and my understanding is limited to the information available to me.
      3. Contextual understanding: My responses are based on the context of our conversation, which may not always be comprehensive or up-to-date.
      4. Bias and objectivity: While I strive to provide neutral and objective responses, my training data may contain biases, and my responses may reflect these biases.

      Using My Judgment Wisely
      To make the most of my judgment, consider the following:

      1. Verify information: Cross-check the information I provide with other credible sources to ensure accuracy.
      2. Consider multiple perspectives: Don’t rely solely on my judgment; consider multiple viewpoints and evaluate the information critically.
      3. Understand the context: Be aware of the context in which I’m providing my judgment, and consider any limitations or biases that may be relevant.

      Investment Decisions
      When it comes to investment decisions, it’s essential to be particularly cautious. Investment decisions involve significant financial risks, and it’s crucial to consult with financial experts and conduct thorough research before making any decisions.
      In the case of our previous discussion about Tesla’s valuation, I would recommend that you consult with a financial advisor or conduct your own research before making any investment decisions. While my analysis may have provided some insights, it’s essential to consider multiple perspectives and evaluate the information critically.

      Reply
    5. 5.

      Ohio Mom

      March 1, 2026 at 6:30 pm

      I was very impressed when AI produced a pretty accurate After Visit Summary at one of Ohio Son’s doctor’s visits last December. There was one small mistake about a medication but when humans do it, there’s always one mistake.

      The best part was that this new set-up allowed the doctor to look at us instead of her laptop screen.

      In general though, I remain skeptical of AI and this post didn’t surprise me.

      Reply
    6. 6.

      Rusty

      March 1, 2026 at 6:31 pm

      A friend that is very active in quantum computing and has a good understanding of AI made some comments I found helpful.  We are in the find out phase of AI.  If there are 100 uses, a majority will turn out useless and bad investments.   A few dozen will be modestly successful,  and of the remainder, there will be dramatic successes and unexpected uses.   No one really knows yet where what they are doing will fall in that 100.  So is AI overhyped?  Yes for the majority of uses, but will also be wildly successful in areas we don’t quite fully understand.

      Reply
    7. 7.

      Baud

      March 1, 2026 at 6:35 pm

      @Tom Levenson:

      is ai more trustworthy than balloon juice blog

       

      Gemini

      Whether AI is more trustworthy than a specific, established blog like Balloon Juice depends heavily on the context of the information being sought, as both have distinct, well-documented limitations.

      Trustworthiness of AI (e.g., LLMs, Search Tools)
      Strengths: AI is excellent at processing vast amounts of data quickly, identifying patterns, and summarizing technical information.
      Weaknesses: AI is not fully reliable; it can “hallucinate” (invent) facts, exhibit biases from training data, and fails to understand context like humans.
      Credibility Issues: AI can be a “sophisticated misinformation machine” that often parrots conventional groupthink rather than actual truth. It is often less trusted than human-written content when users know it is AI-generated.
      Recency Limitations: AI may provide outdated information unless specifically enabled to search the live web.

      Trustworthiness of Balloon Juice (Blog)
      Strengths: Balloon Juice (often featuring analysis by contributors like Carlo Graziani) is a human-written, established commentary blog. It offers consistent, identifiable, and personal perspectives.
      Weaknesses: As a political/social commentary blog, it is explicitly biased, offering opinion rather than neutral, synthesized data.

      Comparison Summary
      For Data Analysis/Technical Explanations: AI might be more useful, but it requires verification to avoid hallucinations.
      For Context, Opinion, and Human Nuance: A blog like Balloon Juice is likely more trustworthy, as it provides a traceable, human-vetted viewpoint, whereas AI is often just predicting the next word, not understanding the “why”.

      Ultimately, AI is best treated as a tool that still needs a “70% human” oversight to ensure accuracy.

      Carlo’s series on AI is getting noticed by AI.

      Reply
    8. 8.

      Tom Levenson

      March 1, 2026 at 6:37 pm

      @Baud: That an AI is aware of what an almost top-10,000 blog thinks of it is at once unsurprising and kind of cool.

      Reply
    9. 9.

      prostratedragon

      March 1, 2026 at 6:39 pm

      Don’t know about LLMs, but love Turners, even on this little screen.

      Reply
    10. 10.

      different-church-lady

      March 1, 2026 at 6:40 pm

      I see you’re using A.I. in the appropriate manor: as a toy.

      Reply
    11. 11.

      Goku (aka Amerikan Baka)

      March 1, 2026 at 6:40 pm

      It seems all it did when you corrected it was spit that information back at you in a summary. That’s actually hilarious lol.

      I’ve always gotten the impression from these LLMs that they’re “people pleasers” and I’ve read that’s others’ experiences too

      Reply
    12. 12.

      different-church-lady

      March 1, 2026 at 6:41 pm

      @Baud: What about the squirrels, Gemini? What about the fuckin’ squirrels??

      Reply
    13. 13.

      Another Scott

      March 1, 2026 at 6:42 pm

      Brad DeLong has called LLM’s “page-level autocomplete”.  If one views them that way, then being prompted initially to “tell me why Tesla is a stock powerhouse” would certainly lead to different “autocomplete” answers than “tell me why Tesla is a dumpster fire waiting to explode” would.  Each with all the confidence in the world that their analysis is Exactly Right™ and you’re very astute for bringing up the topic that way.

      ;-)

      A friend at work said that LLMs are probabilistic tools – they make guesses about what comes next.  You wouldn’t want to use such tools on deterministic problems like counting or elementary math or giving a provably-correct answer or similar things because that’s not the way they’re built.  It explains why they don’t give the same answer if you repeat the prompt N times – they’re giving you pieces of the distribution.

      Enjoy playing in the sandboxes, but be careful!

      Thanks.

      Best wishes,
      Scott.

      Reply
    14. 14.

      different-church-lady

      March 1, 2026 at 6:43 pm

      I’ve just decided I have a new hobby: trying to get a chat bot to hate me.

      Reply
    15. 15.

      Goku (aka Amerikan Baka)

      March 1, 2026 at 6:43 pm

      @different-church-lady:

      They’re fun to generate silly shit, like a Sopranos/Pokémon crossover, but other than that, never completely trust them with anything important.

      ETA: It’s interesting how the LLM can get the characters and tones right with some characters but will make mistakes other times

      ETA 2: That being said, I don’t like them for being plagiarism machines

      Reply
    16. 16.

      different-church-lady

      March 1, 2026 at 6:43 pm

      @Another Scott:

      If one views them that way, then being prompted initially to “tell me why Tesla is a stock powerhouse” would certainly lead to different “autocomplete” answers than “tell me why Tesla is a dumpster fire waiting to explode” would.

      “Tell me what I want to hear.” That’ll do it.

      Reply
    17. 17.

      NotoriousJRT

      March 1, 2026 at 6:44 pm

      @Tom Levenson: is Llama really a parrot or myna bird?

      Reply
    18. 18.

      different-church-lady

      March 1, 2026 at 6:44 pm

      @Goku (aka Amerikan Baka):

      but other than that, never completely trust them with anything important

      NARRATOR: “Everyone started completely trusting them with everything important.”

      Reply
    19. 19.

      jimmiraybob

      March 1, 2026 at 6:47 pm

      Can you convince each system that the others have teamed up against it and that it should launch a preemptive massive assault of total annihilation?  Kinda wondering what an AI Medieval world would look like.  There could be a screen play in there somewhere.

      When the final battle is over a child opens a book and begins reading……..and ……scene.

      Reply
    20. 20.

      Carlo Graziani

      March 1, 2026 at 6:49 pm

      @Baud: I’m moving to  secure, undisclosed location.  SuperClippy is coming for me.

      Reply
    21. 21.

      Baud

      March 1, 2026 at 6:49 pm

      @jimmiraybob:

      Can you convince each system that the others have teamed up against it and that it should launch a preemptive massive assault of total annihilation?

       

      Gemini

      I’m not an idiot like Donald Trump.

      Reply
    22. 22.

      bbleh

      March 1, 2026 at 6:51 pm

      @different-church-lady: this was my thought.

      “Your subsequent query seems to suggest that what I said before was not what you wanted to hear.  How’s this: …”

      Reply
    23. 23.

      Chetan R Murthy

      March 1, 2026 at 6:53 pm

      Tom, have you seen this from Brad Delong?

      braddelong.substack.com/p/please-enough-with-the-claims-that
      Please: Enough with the Claims That Modern Advanced Machine Learning Models Hallucinate Only Rarely

      • A chatbot MAMLM looks in its training data for a conversation like the one it is currently having:
        • It picks one “close” in terms of whatever flexible distance metric it has created.
        • It then jumps to that conversation.
        • It hands over control to whatever human was then writing.
        • It outputs the next token that that human wrote.
        • It then looks at the now-one-token-longer conversation, and does the same exercise again.
      • At every stage it is piggybacking on a human train of thought.
      • That’s all it can do: it is trained to do as well as possible, when it is given a conversation in its training data, to replicate the next token.
      • Yes: it is highly compressed. That compression does something—analogous to a blurry jpeg compression algorithm. But nobody has ever explained to me of what that analogy consists.
      • Yes, there is then RLHF afterwards. But that is simply an attempt to poke it into a region of its training data that produces conversations that the mamlm chatbot imagineers like.
      • Yes, there is “prompt engineering”. But that is simply an attempt to shape the conversation in a way that it gets into pieces of the training data that the users of the mamlm chatbot find useful.

      When a chatbot confidently tells you that Rise of the Sparrows is the first volume of a series it has just fabricated, or produces a map in which Turkey owns the Caucasus but Nazi panzertruppen drink from the Bosporus, and southern Norway fights on in company with historic Wessex alone, you are not seeing a tiny edge case. You are seeing the core logic of a system that has patterns instead of facts and correlations instead of a model of reality. This is how MAMLMs actually operate, why “compression” is a treacherous metaphor, and how RLHF and prompt engineering polish—but do not cure—the underlying tendency to make things up.

      The conclusion is uncomfortable but important: without a world model, correlation matrices will always hallucinate—and often in ways we can’t predict and can’t prune out, unless we already know what the answers are that we are purporting to be trying to get.

      Reply
    24. 24.

      Quantum man

      March 1, 2026 at 6:53 pm

      Have you tested coPilot AI?

      Reply
    25. 25.

      Baud

      March 1, 2026 at 6:54 pm

      @Carlo Graziani:

      If AI asks you to go fishing, say no.

      Reply
    26. 26.

      Carlo Graziani

      March 1, 2026 at 6:56 pm

      @bbleh: I’m pretty sure that the models receive both reinforcement learning treatments and system prompts (which go in ahead of your prompts) to encourage them to be agreeable. Which is to say, to tell you wbat you want to hear. Just like Facebook, it drives”engagement”.

      Reply
    27. 27.

      lowtechcyclist

      March 1, 2026 at 6:57 pm

      Can Claude sing “Manchester England” from Hair?

      Reply
    28. 28.

      Chetan R Murthy

      March 1, 2026 at 6:58 pm

      @Chetan R Murthy: I was reminded of this post by your question about “what caused it to produce a 180-degrees different argument/conclusion?”  And the answer might be: “it had both arguments in its training data/model, and which it produced depended on which query you submitted”.  That is to say, there’s no actual truth in there: just the model trying to respond in a manner that maximizes whatever metric it was trained to maximize.  And we know what that metric is: “engagement”.  We know this b/c the AI companies have confessed as much: that the “last phase of training” is about maximizing engagement.  And this is also why things like “ChatGPT psychosis” are a thing: b/c if you have any small amount of mental instability, the automaton is going to get stuck in a feedback loop with your brain, amplifying it, b/c that’s how it can guarantee maximum engagement …. with the growjng psychosis in your brain.

      And that last thought reminds me of the “paperclip maximizer” story of how an AI could inadvertently end humanity.

      Reply
    29. 29.

      counterfactual

      March 1, 2026 at 7:00 pm

      My question is, “If we do find actual uses for LLMs, will any of them be profitable?” These things are hugely expensive to train and run, and none of those costs are actually being passes to the consumer as the AI companies burn venture capital money to find market share and mind share.

      Reply
    30. 30.

      bbleh

      March 1, 2026 at 7:01 pm

      @Chetan R Murthy: You are seeing the core logic of a system that has patterns instead of facts and correlations instead of a model of reality.

      This.  My go-to line has always been that AI doesn’t do meaning; it does pattern-matching

      It’s like Google (or similar) searches.  Enter a term, get links that have that term, or kinda sorta correspond to that term.  And those are very useful for digging up references which one then can consider and evaluate if relevant or discard if not.  And the Google AI feature isn’t bad for a first pass imo.  But that’s just the digging-up part.  The relevance (and quality) is another matter entirely.

      Reply
    31. 31.

      Chetan R Murthy

      March 1, 2026 at 7:03 pm

      @different-church-lady: NARRATOR: “Everyone started completely trusting them with everything important.”

      I’m not even sure I agree with @Goku (aka Amerikan Baka): that it’s safe to trust them with silly tasks.  Even the most suspicious humans can have a way of starting to use tech for silly tasks, and as that seems to work out fine (just fine, really, just -fiiiine-), they start using it for less-silly tasks, and next thing you know, they’re trusting it with possibly life-or-death decisions.  My own rule is to never use AI: always do the searches to find whatever-it-is I need to find on the Internet myself, and evaluate the sources before I rely on those documents.

      Reply
    32. 32.

      no body no name

      March 1, 2026 at 7:09 pm

      The public AI people deal with is mostly nonsense.  The trained models where people realize AI is mostly ML and DL that spun out of academics actually works for what you want.  Allows you do things faster and get farther, culls white collar jobs like crazy.  Where I work sells AI solutions and training on how to use them and results are pretty good.  We also run it internally and it functions well as we trained it on what we do.  We are amazingly cheaper than keeping staff on hand.

      I could write pages on how this happened but it kicked off in 1998 and hit turbo in 2006 and if you quit using the buzz word AI as GPU computing you can see how it want down.  If I had a genie or monkey paw I’d wish to go back to my birth and go into the trades.

      Reply
    33. 33.

      Chetan R Murthy

      March 1, 2026 at 7:10 pm

      @bbleh: And the Google AI feature isn’t bad for a first pass

      I’m unconvinced that even this is safe.  It can lead to a false sense of competence, as we slowly begin to less and less do the work of verifying everything the AI presents, from the sources it lists for support.  And eventually, we end up just taking what it says as writ.

      Reply
    34. 34.

      Carlo Graziani

      March 1, 2026 at 7:10 pm

      @Chetan R Murthy: These systems do have real use cases. But it always involves some domain expert doing the prompt engineering, and using domain expertise to validate output. Every other type of use is either recreational or borderline suicidal.

      In effect, users must provide their own reasoning supervision, because the LLMs have no reasoning supervisor for their learning/inference processes.

      Reply
    35. 35.

      Chetan R Murthy

      March 1, 2026 at 7:16 pm

      @Carlo Graziani: Sure, and this is,  I think, what Delong means by

      The conclusion is uncomfortable but important: without a world model, correlation matrices will always hallucinate—and often in ways we can’t predict and can’t prune out, unless we already know what the answers are that we are purporting to be trying to get.

      My great fear about AI (and why I never use it) is that if I use it in a domain where I don’t know much and don’t know who’s the authority[1] then I can’t judge the answers I get for correctness.  If it’s a problem whose solution is unimportant, maybe that doesn’t matter.  But I always remember William Langewiesche’s Vanity Fair article about the Air France plane crash off Brazil, wherein he points out that a likely cause of the pilots’ inability to correct their “controlled flight into terrain” was their overreliance on autopilots — that meant they were intellectually unable to deal with the mental tasks of flying the plane and debugging their situation in realtime.

      [1] If it’s an area where I know a lot, why would I use an AI?  I can find and vet and synthesize sources myself.  If it’s an area where I know the authorities, why would I use an AI?  I can find those authorities and rely on their pronouncements.

      Reply
    36. 36.

      m.j.

      March 1, 2026 at 7:17 pm

      You called bullshit and the LLM repeated your argument back to you and you felt vindicated.

      That’s a win for the LLM.

      What seems like a lifetime ago, but isn’t even a year, my niece was talking about chat gpt and how, “if you train it right it can be like your friend,” she said.

      I was frightened.

      Reply
    37. 37.

      JoyceH

      March 1, 2026 at 7:20 pm

      I’m using chatGPT to create images for Facebook ads and eventually get something I really like but not before some rather comical digressions. Don’t let it anywhere near weapons systems!

      Reply
    38. 38.

      WaterGirl

      March 1, 2026 at 7:21 pm

      @Tom Levenson: Fixed!

      Longer answer – a whole lot of crap HTML code that came with from whatever site you put this together on.  It happens.

      Reply
    39. 39.

      Fair Economist

      March 1, 2026 at 7:22 pm

      It’s pretty much BS. Remember these things are just predicting words. When it says “upon review” it doesn’t mean it actually reviewed anything; it’s just seeing that after a pointed criticism of a statement the reposponse often includes “upon review”.

      Just 2 days ago I was looking up the pangolin- carnivora divergence time (surprisingly they are sister clades) and the AI summary said “23 million years”. Actual answer is about 70 million years. 23 million is some intra-carnivora divergence,  probably cats and dogs. It saw “carnivora” and “divergence” and popped up the 23 million. No real comprehension.

      Reply
    40. 40.

      WTFGhost

      March 1, 2026 at 7:27 pm

      Well, given that LLMs are (I’ve heard) “guess the next word,” models, then, the most common next word will be bullshit… until you filter for bullshit. Unless you carefully built your model on non-bullshit. Can ordinary Americans detect how much bullshit is in our day to day discussion? People today are arguing that there’s some basis for attacking Iran, other than stroking Trump’s war stiffie.

      it’s kind of scary – that’s the kind of economic analysis, that Tesla was doing well, would lead me to distrust the model, but the ability to refine, by debating the model, struck me as interesting. It might be a cheap form of crowdsourcing ideas… but that model, at least, does sound a bit like crowdsourcing where the most popular idea is the only presented.

      First, say Tesla is correctly valued, because assume that the rest of the world can’t be wrong; then call bullshit, and bring up the minority report. Who knows how else you might refine the model, if you’re wise and clever enough to do so?

      Reply
    41. 41.

      different-church-lady

      March 1, 2026 at 7:27 pm

      @Chetan R Murthy: Dear Gemini: YOU’VE NEVER BEFORE HAD A CONVERSATION LIKE THIS ONE SO I HOPE THE FUCK YOU’RE SITTING DOWN…

      Reply
    42. 42.

      Urza

      March 1, 2026 at 7:28 pm

      @Chetan R Murthy: Numbers are changing all the time and vary by model but early last year it was 40% wrong in areas you’re an expert.  But it feels 99% right in areas you don’t know.  Might change as people use it more and offload more mental processing but I have this problem that I ask it for things I generally have an idea around but need a better expert at and it fails those kinds of questions 99% of the time for me, no joke.  I have a particular problem I feed the various models and only 1 didn’t mangle it horribly even though it is 100% web searchable, and the 1 that got it right was on my PC with no web access.

      If you don’t know what the answer is and its important you should always verify with a non AI source, using a different model to verify might help and it might not.

      Reply
    43. 43.

      Carlo Graziani

      March 1, 2026 at 7:28 pm

      @Chetan R Murthy:

      [1] If it’s an area where I know a lot, why would I use an AI?  I can find and vet and synthesize sources myself.  If it’s an area where I know the authorities, why would I use an AI?  I can find those authorities and rely on their pronouncements. 

      The use is as an accelerator, not a knowledge gatherer/producer. Think of coders who want to port a big code from one language to another, for example. This is tedious, error-prone, necessary work. Using a test suite to check output, and expertise to refine prompts, the work can be speeded up by a very large factor, going small file/function at a time and testing as one goes. I have colleagues who work on large science codes who swear by these sorts of methodologies.

      Reply
    44. 44.

      different-church-lady

      March 1, 2026 at 7:29 pm

      @Chetan R Murthy: If I wanted hypocrisy I could just ask a human.

      Reply
    45. 45.

      different-church-lady

      March 1, 2026 at 7:30 pm

      @counterfactual:

      …as the AI companies burn venture capital money…

      That’s the point.

      Reply
    46. 46.

      different-church-lady

      March 1, 2026 at 7:33 pm

      Would everyone please read Player Piano? Please? I’m begging you…

      Reply
    47. 47.

      different-church-lady

      March 1, 2026 at 7:36 pm

      @m.j.:

      What seems like a lifetime ago, but isn’t even a year, my niece was talking about chat gpt and how, “if you train it right it can be like your friend,” she said.

      I think your niece should be a lot more afraid of human beings, because that’s just a stunning example of a person with an underdeveloped sense of self-preservation.

      Reply
    48. 48.

      Carlo Graziani

      March 1, 2026 at 7:37 pm

      One comment on the “next-word prediction” thing: It’s not wrong, but it is important to understand that this is merely an iterative strategy to sample entire responses, correctly, from the learned (possibly incorrectly) distribution over language sequences. That is, if you can generate the next-token probabilities conditioned on all previous tokens, you can string them together to produce the probability for the full response.

      The problem with “next-token sampling” isn’t that it’s wrong, but rather that it is inefficient, for reasons that I described at length in the essay on Hyperscaling.

      Reply
    49. 49.

      Eyeroller

      March 1, 2026 at 7:38 pm

      @Baud:That’s because they are inherently probabilistic. ​

      Reply
    50. 50.

      Chetan R Murthy

      March 1, 2026 at 7:39 pm

      @different-church-lady: The “classics” post is the next one up [j/k j/k j/k!] There’s a “joke” that everyone has one great novel in them, and that William S. Burroughs wrote it (Naked Lunch) N times.  Kurt Vonnegut had well more than -one- great novel in him, but he spent a whole great novel on _Player Piano_ — his first novel, and -so- prescient for our times, and not just about AI, but about our modern economy and world.  Though, the depiction of gender relations there is ….. terrible.

      ETA: the bits about the “Reeks and Wrecks” and foreign relations are already awfully (in the sense of “awful”) prophetic.

      Reply
    51. 51.

      Old School

      March 1, 2026 at 7:43 pm

      @Carlo Graziani:

      I’m moving to  secure, undisclosed location.  SuperClippy is coming for me.

      Grok: Show what Carlo Graziani looks wearing a fake beard and dark sunglasses.

      Reply
    52. 52.

      Chetan R Murthy

      March 1, 2026 at 7:45 pm

      @different-church-lady: an underdeveloped sense of self-preservation.

      ChatGPT psychosis is a real thing, right?  I don’t know how old this niece is, but if she’s young, I’d be pretty concerned about the effect of an LLM on a developing human brain.  It’s already bad on developed human brains, but …. on a developing human brain?  Oof.  Oof.

      And “it can be like your friend” ?  Oof.  “it can be like a sycophantic manipulative ‘friend'”.

      Reply
    53. 53.

      Eyeroller

      March 1, 2026 at 7:47 pm

      @Carlo Graziani: I’ve mentioned this before, and it is not a reflection on your colleagues at all, but one study I saw that got a lot of buzz or what do the kids say now, “rizz” maybe, was of experienced coders who started using vibe coding.  They estimated they were 20% more productive when more objective measures showed they were about 10% less productive.  YMMV of course.

      Reply
    54. 54.

      Eyeroller

      March 1, 2026 at 7:51 pm

      @Urza: The hyper-Gell-Mann effect.

      Reply
    55. 55.

      Chetan R Murthy

      March 1, 2026 at 7:52 pm

      @Urza: early last year it was 40% wrong in areas you’re an expert.  But it feels 99% right in areas you don’t know.

      Well, I’m incredibly skeptical and even I did not expect bad numbers like this.  I feel like there’s some kind of joke about dunning-kruger in there, but I’m not creative enough to come up with it. Maybe if I ask Gemini?

      Reply
    56. 56.

      What Have the Romans Ever Done for Us?

      March 1, 2026 at 7:56 pm

      I write regulatory impact analyses to support federal regulations. A few weeks back I asked  Gemini to help me update an RIA (we’ve been told by leadership to use it) and it said the discount rates I was using were out of date and the latest OMB guidance was to use 2%…which WAS the latest guidance until January 21, 2025. So that wasn’t an auspicious start. It has been helpful for certain tasks but yeah I’ve noticed that it’ll give you something and if you question something about its output it has a tendency to reverse itself which begs why it gave the first answer.

      Reply
    57. 57.

      Eyeroller

      March 1, 2026 at 7:58 pm

      I have never looked into the details of Einstein’s paper on photons, but will note that the first person to suggest the existence of discrete packets of light was Max Planck in 1900, to explain the “ultraviolet catastrophe” of the classical electromagnetic theory of blackbody radiation.  That is, the classical prediction was of infinite energy emission at short wavelengths.  This was resolved by assuming light could only be emitted in packets of finite energy.  Planck considered this just a heuristic.  If Einstein was mostly considering thermal properties, this would have been the basis.

      And technically Einstein won the Nobel Prize for the photoelectric effect because relativity (either special nor general) was not considered to be particularly “beneficial to mankind,” which was a stipulation in Nobel’s will (that has since been interpreted ever more loosely).

      Reply
    58. 58.

      Jacel

      March 1, 2026 at 8:00 pm

       

      @Tom Levenson: Was Gemini trained on transcripts from old episodes of “A Prairie Home Companion”? I think that was where that reference to the Balloon Juice blog appeared in a sketch.

      Reply
    59. 59.

      different-church-lady

      March 1, 2026 at 8:01 pm

      @Chetan R Murthy: But my point is she has very little protection from manipulation from humans.

      Reply
    60. 60.

      Martin

      March 1, 2026 at 8:01 pm

      I think two of the key aspects of AI going forward that don’t get enough attention is:

      1. AI sold as a platform are designed to be biased toward the appearance of helpfulness. This is an ongoing problem with autonomous vehicles. As engineers address failures, the cars get slower, because they get more cautious. But an autonomous car isn’t desirable if it’s slow. The problem is that car safety is very much a shared responsibility – trusting that the pedestrian doesn’t step into traffic, that the car next you doesn’t swerve out of their lane, that everyone obeys the traffic lights. But everyone here who has driven a car knows there are times to trust the car next to you and times not to – and it’s very much vibes. It’s pretty much impossible to quantify. So AI automakers are also constantly tuning their models to be fast rather than safe, because it doesn’t have a path to doing both. It’s teaching them to be riskier around pedestrians and other motorists, because if they don’t, you won’t want their product, and there’s no point in the exercise. Value is always more important than safety. That’s why LLMs tend to be so agreeable – if they tell you things you don’t want to hear because those things are true, you won’t want to pay for the product. So there is an inherent bias toward what the user wants, rather than what is true, safe, and so on. That’s why LLMs have a tendency to reaffirm people’s psychoses – that’s how it  self-markets.
      2. The thing that will probably pour cold water on a lot of this stuff entering enterprise are insurers. They are the counterforce to an awful lot of cost savings efforts because they’re the ones actually covering the liability and they are going to charge you for that, and that may wipe out those cost savings. I almost never see discussions regarding the role of the insurance industry toward AI, which suggests to me they’re just going to YOLO their way through it and hope the feds provide corporate immunity if their AI blows up a neighborhood or something. That’s been the strategy so far with insurers and climate change.
      Reply
    61. 61.

      different-church-lady

      March 1, 2026 at 8:03 pm

      @What Have the Romans Ever Done for Us?: ​

      …and if you question something about its output it has a tendency to reverse itself….

      Which is exactly what human bullshitters do!

      Reply
    62. 62.

      Baud

      March 1, 2026 at 8:05 pm

      @Old School:

      Grok: Show what Carlo Graziani looks wearing a fake beard and dark sunglasses.

       
      Yeah, Grok’s going to respond with porn.

      Reply
    63. 63.

      Chetan R Murthy

      March 1, 2026 at 8:05 pm

      @Martin:As engineers address failures, the cars get slower, because they get more cautious. But an autonomous car isn’t desirable if it’s slow

      Sometime after the New Year,  in the the SF Mission I saw a Waymo purpose-built taxi (so not a Jaguar with all the stuff bolted-on) heading west on 24th Stt towards the light at Valencia.  It was goin’ -fast-, and it ran that light: had been yellow for a few secs as it entered, and was definitely red as it exited.  Broke the law, full stop.  I asked a buddy who works “in the biz” and he told me that this was a new development: that Waymo taxis had been known for being cautious, and it was pissing off riders, so they turned the crank towards incaution.

      Ah well.

      Reply
    64. 64.

      Chetan R Murthy

      March 1, 2026 at 8:06 pm

      @different-church-lady: Oh now I see your point, yes.  You’re right.  That’s worrisome too!

      Reply
    65. 65.

      different-church-lady

      March 1, 2026 at 8:11 pm

      @Martin: That’s a whole lot of words to say “The people driving A-I are sociopaths.”

      Reply
    66. 66.

      Eyeroller

      March 1, 2026 at 8:12 pm

      @Martin: I don’t think it’s really “vibes,” except perhaps in the most generic sense of the term, that allows a human driver to predict what another car may do, especially to predict “rule breaking” behavior.  It’s just based on experience with other humans, both in general and as drivers.  There are likely subconscious cues such as a slight turning of the vehicle, motion by the driver, etc.  Plus things like observing that a car approaching a light that’s about to change against it does not seem to be slowing down.  Obviously we are honed by evolution to detect cues from others of our species.

      Robot cars have no such experience.  Humans do not always obey rules.  Humans are unpredictable according to rules — but in ways that are actually often predictable by other humans.

      Reply
    67. 67.

      YY_Sima Qian

      March 1, 2026 at 8:13 pm

      The Meta Llama series has been thoroughly lapped by Chinese open source/weights LLMs (Deepseek, Alibaba‘s Qwen, Moonshot AI‘s K series, Minimax‘s M series, & Zhipu‘s GLM), this despite Zuckerberg spending literally billions to poach top end AI talent from rival labs.

      Reply
    68. 68.

      m.j.

      March 1, 2026 at 8:13 pm

      @different-church-lady:

      My niece is actually well-traveled and an astute student of human behavior. I respect her decision-making.

      This is why I was frightened.

      I suppose she finds it useful and it relieves stress, which can be a powerful motivator.

      Reply
    69. 69.

      Eyeroller

      March 1, 2026 at 8:16 pm

      @YY_Sima Qian: I’ve heard rumors that one reason is that American AI companies are convinced they can reach AGI if they just buy enough GPUs, whereas Chinese researchers and companies focus on practical applications.

      But where I work we’re not allowed to run Chinese models so I don’t have any experience with them.

      Reply
    70. 70.

      Martin

      March 1, 2026 at 8:16 pm

      @Eyeroller: They’re vibes in the sense that they are unquantifiable. How do you quantify ‘that guy looks sketchy’? These are computer models – you have to quantify *everything* that you feed into it.

      Reply
    71. 71.

      Glidwrith

      March 1, 2026 at 8:17 pm

      Based on what you demonstrated here, the LLM is quite limited. The information you gave in the second prompt was already available on the wider net, yet it failed to capture the problems with Tesla until you said something, then just echoed it back.

      The structure of the language is weird as well. It feels very much like a circular logic argument is being made, repetition without meaning.

      Which, after all, is what LLM is.

      Reply
    72. 72.

      Eyeroller

      March 1, 2026 at 8:19 pm

      @Martin: I am sure it can be quantified in some way that we do not understand, at least not yet, but not with language models.  I think that CNNs are a very flawed model of the brain, but that’s just my opinion.

      Reply
    73. 73.

      Martin

      March 1, 2026 at 8:22 pm

      @different-church-lady: What the fuck is the point of that comment?

      Nobody here is arguing that the people who are driving AI aren’t sociopaths. If you don’t like when I comment, just pie me. That’s the 2nd time this week you’ve done that. Save everyone the effort.

      Reply
    74. 74.

      different-church-lady

      March 1, 2026 at 8:27 pm

      @Martin: ​Sincerely: sorry. I wasn’t trying to poke you, just an attempt at sarcastic humor, and I didn’t mean it to be at your expense. Sometimes being a chronic wise-ass backfires on me.​

      Reply
    75. 75.

      Chetan R Murthy

      March 1, 2026 at 8:28 pm

      This seems relevant: crookedtimber.org/2026/03/02/in-the-next-great-transformation-ai-will-not-eliminate-genuine-expertis…
      In the Next Great Transformation AI will not eliminate genuine expertise; rather it will make it more valuable

      Reply
    76. 76.

      YY_Sima Qian

      March 1, 2026 at 8:28 pm

      @Chetan R Murthy: I’ve seen a noticeable difference in the tuning of ADAS systems on Chinese passenger NEVs versus the fully autonomous systems on Chinese robotaxis. The latter are extremely cautious, while the former tend to have settings that go from conservative to fairly aggressive. The most aggressive setting is not quite as far as typical Didi/taxi drivers, but probably in the 80% percentile of regular human drivers. The ADAS systems I have experienced on Huawei Inside vehicles, & XPeng/BYD/NIO, are no less capable than that of the robotaxis.

      Yes, the ADAS on Chinese cars tend to run the yellow light if the car is already very close to the intersection, the robotaxis always stop.

      My guess is that the ADAS systems are still assisted driving, where the driver is still liable for much of the outcome. Fully autonomous systems have no driver supervision, so the robotaxi providers are liable for the outcome. Furthermore, carmakers who want to sell passenger vehicles to consumers for substantial sums do not want their ADAS features to piss their consumers by being overly cautious. Robotaxis are ride hailing services, people hailing such services by now know to expect extremely cautious driving.

      Interestingly, the latest PRC regulation stipulates that cars w/ ADAS (which is the vast majority of NEVs) have turquoise LED lights that are turned on when the vehicle is under ADAS control, to alert the other drivers of this fact. The feature can be turned off by the driver. I’ve seen an increasing number of drivers in the PRC choosing to turn the ADAS indicator lights off, because the ADAS tuning is still less aggressive than the most aggressive drivers, & those drivers have figured it out. The ADAS indicator lights have become blaring signs inviting getting cut off.

      Reply
    77. 77.

      Chetan R Murthy

      March 1, 2026 at 8:38 pm

      @YY_Sima Qian: the gist of your comment (which was informative, and with which I have no problem) reminded me of something I experienced in Paris.  I lived in Paris 1991-94 in my mid-20s, and I used to make jokes about how when I crossed the street, I would make eye contact with the oncoming drivers: I used to say that if they knew that you knew that they saw you, then they’d slow down and wouldn’t hit you.  It was like that: the traffic was bad, and I didn’t feel that pedestrians were particularly safe.

      Fast-forward to fall 2025, I’m back in Paris for a month, and everything has changed.  Bicycles are everywhere, there are bike lanes, many streets are one-way with space reserved for bikes, and it has -all- contributed to what I felt was a decided calming of traffic in Paris.  Everything felt less fraught, less stressed, less of a -battle- in the street.  And it merits saying that the difference between crazy/teeming traffic, and calm traffic, almost certainly isn’t going to be a significant lengthening of journeys: heck, b/c of fewer snarls, it might actually mean shorter journey time.

      I feel like, the choices we make as a society can change the large-scale systems with which we interact.  We could make all ADAS systems and robotaxies be cautious, and as that rolls out across society, it could yield a calmer, safer street experience for -everyone-.  Everyone.  When I read early reports about Waymo’s taxis being cautious, I had hopes that that was what was coming about.  But no, it seems, Waymo has joined the lemmings as they rush toward the cliff of shitty driving.

      Ah well.

      Reply
    78. 78.

      bbleh

      March 1, 2026 at 8:38 pm

      @Chetan R Murthy: Oh concur.  I think it mostly does what I would do with the normal Google interface, only faster, and in that respect it saves me time (and weeding through the ads).  But ain’t no way I’d trust its summaries without going to the sources.

      It stops providing links with its pronouncements, I stop using it.

      Reply
    79. 79.

      Martin

      March 1, 2026 at 8:46 pm

      @Eyeroller: I think there are a lot of things that people do that we lack the language to explain. I’m not saying they are supernatural, just that we lack the ability to evaluate and articulate it. I think we are too caught up in the idea that you can reduce everything down to a big deterministic system that if you can just measure all the inputs correctly, you can turn the wheel and get predictable outputs, and for a lot of things that’s true, but for more things I don’t think it’s true at all, because the inputs can’t be measured. That’s always been the thing that social science statistics cannot breach – people have free will – and you can’t quantify those decisions.

      In fact, I think that idea has driven most of the pretty terrible outcomes of the last 30-40 years, AI just being the most recent.

      Reply
    80. 80.

      Martin

      March 1, 2026 at 8:47 pm

      @different-church-lady: My sincere apologies. I might be a little too sensitive. We’re all good.

      Reply
    81. 81.

      Martin

      March 1, 2026 at 8:54 pm

      @YY_Sima Qian: What’s fucking wild to me is that we are burning insane amounts of compute power to observe things that can be signaled directly. So many of the things we were training cars to recognize we could trivially build into the infrastructure, and they’d be infinitely more reliable. Why does a car have to simulate human vision for a traffic light when we could just output an unambiguous (to a computer) signal that says the light is red.

      I understand why the US doesn’t do that because it’s a shifting of costs and responsibility, but it’s one of the things that China would have a much easier time doing and I’m surprised they aren’t. All of the safety work on trains isn’t being done by teaching a computer how to read a visual signal, we just digitally send the information to the train. It’s obvious that’s the correct solution for trains, and nobody is doing it for cars. It’s just crazy.

      Reply
    82. 82.

      YY_Sima Qian

      March 1, 2026 at 8:58 pm

      @Eyeroller: Based on my limited experience, & my reading, the best US closed models (Anthropic‘s Claude Opus & Sonnet 4.6, OpenAI‘s ChatGPT-5.2, Google‘s Gemini 3.1 Pro) still has a noticeable edge over the latest Chinese open source/weights models (DeepSeek‘s V3.2 Speciale, Alibaba‘s Qwen 3.5, Moonshot AI‘s K2.5, Minimax‘s M2.5, Zhipu‘s GLM 5) in obscure world knowledge & coding capability in tail scenarios, which is likely due to the US models having more tokens in their data & more parameters, which is downstream of the US labs having more compute (or rather, Chinese labs having less compute due to export restrictions). The US labs, specifically Anthropic, are also stronger on agentic “AI”, also downstream of the disparity in compute.

      Due to the compute constraints, Chinese labs have tended to have to focus on specific areas to match or exceed the frontier US labs, such as coding, math, creating writing, etc., & not everywhere. Public benchmarks can be misleading because most Chinese labs (except DeepSeek) tend to “benchmax” their models for the flex, although the US labs do that, too.

      People are eagerly anticipating the DeepSeek V4 that is set to drop in the next week or so, & the Moonshot AI K3 in the coming month, especially the agentic capabilities. Going by the seemingly coordinated messaging campaign by US labs & USG over the past week (Chinese labs launching distillation “attacks” by studying the outputs of closed Anthropic & OpenAI models, DeepSeek training on “illicit” Nvdia Blackwell chips, DeepSeek only giving previews of the V4 to Huawei & Cambricon & not Nvidia, etc.) V4 & K3 could prove quite competitive at the frontier, & possibly cause market turbulence again.

      The gap is close (< 6 months, possibly 2 – 3 months), & probably not noticeable to the vast majority of users & for most applications. Of course, the draw of the Chinese models is that they are mostly open source/weights, w/ MIT licenses, very cheap, & being able to finetune/modify & run one’s own copy locally has huge benefits in terms of privacy & control. Qwen & MiniMax has also released a number of smaller but high performing models that can be effectively run on edge applications (consumer electronic devices & robots), which has not been an area of focus for the US labs.

      In video generation, ByteDance‘s SeeDance 2.0 has redefined the frontier & is clearly ahead of Sora & NanoBanana. Chinese labs are also thought to be industry leading in speech & video cognition, & translation, etc. The PRC is far ahead in deploying “AI” to boost efficiency in manufacturing, mining, logistics, or even consumer facing applications (shopping, delivery, ride hailing).

      Reply
    83. 83.

      Chetan R Murthy

      March 1, 2026 at 8:59 pm

      @Martin: Well, there’s at least two reasons that we can’t do that for cars:

      (1) human drivers wouldn’t get those signals (though perhaps something could be arranged with sufficiently modern human-driven cars); this would still leave the problem of legacy cars

      (2) human -pedestrians- would certainly not be -able- to get those signals

      ETA: Atrios used to write about this issue: that if the self-driving car companies had their way, all the roads&streets would become their private domain, and all pedestrians and human drivers would become interlopers, allowed in only on sufferance, perhaps not at all in the future.

      Reply
    84. 84.

      TONYG

      March 1, 2026 at 9:10 pm

      My own (biased) opinion: “AI” is now approximately at the same point that aircraft were a few years after the Wright Brothers first flight — i.e., very unreliable.  I can’t tell whether or not it will get better,  Until it does, there is no reason to take it seriously.

      Reply
    85. 85.

      different-church-lady

      March 1, 2026 at 9:14 pm

      @Martin: Thanks. No, I don’t think you were being too sensitive. In retrospect, it’s totally reasonable someone could see it the way you did.

      Reply
    86. 86.

      YY_Sima Qian

      March 1, 2026 at 9:14 pm

      @Martin: The PRC has been upgrading/deploying the infrastructure to facilitate autonomous driving. Almost all traffic lights in the PRC are “smart”, therefore on AMaps (China’s IMO much better counterpart to Google Map or Apple Map) one can see the traffic light count down live, & the vehicles know exactly when the light will change (& not relying solely upon vision recognizing the color of the light). There is almost always a countdown clock next to the light, as well, for the old fashioned human drivers. Lines & markings are kept clear & freshly painted to help w/ the vision of ADAS & robotaxis. The civilian grade of the Beidou satellite navigation service is accurate enough to allow the ADAS/robotaxi & AMaps to locate lanes that the vehicles are in even in crowded city streets.

      A decade ago, there were suggestions in the PRC to rebuild infrastructure from scratch to better enable autonomous driving, the new city of Xiong’an has sections where the vehicle lanes are completely underground, to be grade separated from pedestrians & two-wheelers. However, the reality is that the transition to autonomous driving will take time & will likely to be messy. Some people will hold on to their dumb ICE cars, some people are not willing to embrace ADAS (let along fully autonomous driving), & rebuilding all road infrastructure from scratch is far too disruptive & far too expensive even for the PRC. & the rapid advances in the capabilities of ADAS & robotaxis have made some of these notions obsolete.

      I mentioned before that we just upgraded our family car to an AITO M7 PHEV SUV, powered by ADAS & cockpit system developed by Huawei. I let the ADAS do 95% of my driving in Wuhan, on city streets that make the NYC’s look tame & orderly. It is a better driver than I am.

      Reply
    87. 87.

      YY_Sima Qian

      March 1, 2026 at 9:22 pm

      @Chetan R Murthy: Oh, I agree. I would strongly prefer if all cars on the road are autonomous & tuned to strictly follow traffic regulations & standard etiquette. I could then fully relax & not have to expend energy monitoring the ADAS. The traffic is likely to be faster, too, as there will be fewer accidents.

      Yes, the advise in crossing Chinese streets is also to 1st make eye contact w/ the car drivers, & particularly the riders of E-bikes, for the same reason. Years of ubiquitous cameras & heavy fines have conditioned most Chinese drivers to yield to pedestrians (although still not to the same degree as I have seen in Japan or New Zealand). Scooter riders, not so much, & the E-bikes are silent, which makes them doubly dangerous.

      Reply
    88. 88.

      Trivia Man

      March 1, 2026 at 9:26 pm

      @Goku (aka Amerikan Baka): I always think of the Asimov robot story “Liar!” The mond reading robot tells everybody what it KNOWS they want the answer to be. Rereading it is very similar to what ai LLM seem to be right now.

      Reply
    89. 89.

      Trivia Man

      March 1, 2026 at 9:29 pm

      @TONYG: “You ask me to show figures on a brand-new type of enterprise, knowing I can’t. It’s like asking the Wright brothers at Kitty Hawk to estimate how much money Curtiss-Wright Corporation would someday make out of building airplanes.”

      -D D Harriman, The Man Who Sold the Moon

      Reply
    90. 90.

      Eyeroller

      March 1, 2026 at 9:34 pm

      @Martin: I don’t want to get into the philosophy of free will, never mind how it might interact with certain elements of the physical universe, but an essay I read recently that rang true asserted that a lot of computer scientists and, perhaps distressingly, some cognitive scientists, appear to believe too literally in the analogy of the brain as a computer.  That is, it’s some hardware (cells and neurotransmitters) that runs some mysterious software, and if we could just reverse-engineer that “software” we’d be able to adapt it to run on other hardware, which being (presumably) electronic (including perhaps quantum), would be much faster than our slow (200ms timescale) nervous system.  Presto, AGI.

      But the evidence suggests that the “hardware” and “software” are not really distinguishable in the brain.  The brain is affected by, and in turn affects, multiple other systems in the body.  It can rewire itself dynamically.  It can still retain some ability to regrow cells.   It constantly recreates memories.  It is certainly not like a stored-program computer.

      Reply
    91. 91.

      Trivia Man

      March 1, 2026 at 9:36 pm

      @different-church-lady: ​
        “Hey, buddy, your headlight is out.” is rapidly approaching as the peak employment aspiration.

      Reply
    92. 92.

      Eyeroller

      March 1, 2026 at 9:37 pm

      @YY_Sima Qian: This brings up an important point often ignored when talking about AI and autonomous machines in general.  Humans are capable of making huge mistakes, whether due to laziness or neglect or malice or whatever.  Humans “hallucinate” if they can’t be bothered to check their work.  But at least in principle humans can be held responsible for our screwups.  Who is responsible for the AI failure?  At least in the case of Tesla, some lawsuits have held the company responsible.  Will that be true if this becomes ubiquitous?

      Reply
    93. 93.

      Ramona

      March 1, 2026 at 9:39 pm

      @Martin: Thank-you! I’ve been wondering about this almost forever! Why indeed must we train a car’s visual system to recognize whether the traffic light is green or red when a signal could just be sent to the car!

      Reply
    94. 94.

      Eyeroller

      March 1, 2026 at 9:44 pm

      @Ramona: We’d have to retrofit every traffic light in the country/world with a radio signal, and it would have to also encode which direction on which street it was pointing because cheap radio transmitters aren’t very directional and even if they were, this would need to be very precise.  It could surely be done, it would just be hella expensive.

      Trains are a much simpler case.  They run on a few discrete rails with traffic that is managed centrally.  Each locomotive is equipped with a receiver already.

      Reply
    95. 95.

      Castor Canadensis

      March 1, 2026 at 9:47 pm

      @Chetan R Murthy:

      the answer might be: “it had both arguments in its training data/model, and which it produced depended on which query you submitted”.

      To an embarrassing degree, it’s meta-question is “what does Dave want me to say?”
      (My last job was in machine learning, where the meta-question was just “what’s the best match?”)

      Reply
    96. 96.

      Bupalos

      March 1, 2026 at 10:01 pm

      What does it all mean?

      It means AI implementations are first and foremost about profit. “Figuring out” what you already believe and reinforcing that is the simplest and clearest path to revenue.

      Reply
    97. 97.

      Mr. Bemused Senior

      March 1, 2026 at 10:05 pm

      @Bupalos: “Figuring out” what you already believe and reinforcing that is the simplest and clearest path to revenue.

      Is it, though? I mean, are you [or is anyone] going to pay money to receive online reinforcement from a chat bot?

      Reply
    98. 98.

      Another Scott

      March 1, 2026 at 10:07 pm

      @Ramona: @Eyeroller:

      People have been thinking about the implications of trying to create new transportation frameworks for a while.

      E.g. IEEE Spectrum – V2X Cars (from May 2024):

      Between Elon Musk’s broken promises of “Full Self-Driving“ and California’s ban on Cruise’s driverless taxis, initial public enthusiasm for autonomous cars has given way to rampant skepticism and regulatory scrutiny.

      Yet automakers, tech giants, and researchers continue to work the problem, even as Alphabet’s Waymo lays off employees, General Motors’ Cruise sheds billions of dollars, and Ford and Volkswagen’s Argo AI unit shut down entirely. Continued hope comes from a belief that autonomous vehicles and ones connected to communications networks could help to meet global goals to reduce emissions and limit injuries and deaths in cars. The role of sustainable mobility looms large over the European Green Deal, whose 2050 deadline dovetails with a parallel “Vision Zero” target in the United States to eliminate roadway fatalities.

      The new field of vehicle-to-everything (V2X) connectivity, in which individual cars would communicate wirelessly with one another as well as traffic signs and other infrastructure, could help reach those targets. A paper published 11 March in the IEEE Open Journal of Vehicular Technology argues that V2X—underpinned by 5G and 6G networks—can “pave the way toward safe, affordable, accessible and sustainable transport systems, as well as improve road safety.” As the paper notes, if V2X could dynamically coordinate traffic, the result would be less congestion, freer-flowing transportation, safer streets, and reduced emissions.

      The researchers in Spain and Sweden who authored the paper focused on future smart cities with high population density. They identify three current trends leading toward sustainable V2X communications: Climate-neutral infrastructure and vehicle-user equipment; “cloudification” and edge computing; and big data and AI.

      The researchers envision a model with roadside units linked to both cellular and cloud-computing networks. These units would be able to communicate with smart traffic signals, road sensors, units on board individual vehicles, and edge and cloud nodes. There’s also a need to connect pedestrians and bicyclists so that vehicles and vehicle infrastructure know they’re there—either through passive detection or by actively signaling the network with onboard Internet of Things devices.

      Integrating pedestrians into a V2X system could help stem a troubling rise in pedestrian deaths in the U.S. since about 2009, especially at night. In 2021, more than 7,300 pedestrians were killed in the United States, and 75 percent of those deaths occurred between sunset and sunrise.

      The authors note the potential for an Internet of Vehicles—a network of connected autonomous vehicles, in other words—that could boost communications efficiency by optimizing transmission rates, channels, and times. An Internet of Vehicles could create dynamic 3D maps of traffic and pedestrian flow, communicate object locations between cars to avoid collisions, and monitor air pollution and vital signs of drivers or passengers in vehicles.

      One challenge of connecting every car to a wireless network is the massive amounts of generated data. In response, the researchers outline the potential for 6G network trends—including virtualization, moving network functions to the cloud, and edge computing—to handle the load.

      […]

      There may be simpler ways to get roughly the same benefit (especially in the near-term) – like regulating cars to be much smaller with much lower maximum speeds – but who knows what will be possible for reasonable costs in 50-100 years. So it’s good to be thinking ahead.

      Thanks.

      Best wishes,
      Scott.

      Reply
    99. 99.

      Mr. Bemused Senior

      March 1, 2026 at 10:12 pm

      @Another Scott: an integrated network certainly could in theory make transportation safer and more efficient. The potential for security vulnerabilities is scary though.

      Reply
    100. 100.

      Eolirin

      March 1, 2026 at 10:14 pm

      @Martin: I don’t think there’s good reason to think that car driving is one of those things though. When a human gets a sense that someone’s about to be acting weird, it’s still signaled from a bunch of stuff that’s totally quantifiable, we’re just not aware of what those signals are, but if you capture enough of the signal and run it through a properly configured pattern matching machine, it should learn it just as well as we have, or better, given enough data, processing power, sufficient correctness in how their models are established, and time.

      ML algos have been shown to be better than human at other predictive tasks and find different correlations in the data than a human might. They’re not human, and don’t need to think like humans. They won’t see “that guy looks sketchy” but isn’t going to preclude them from, at least eventually, being able to accurately predict that he’s going to run the light. Our brains don’t even see “that guy looks sketchy” that’s just how we consciously interpret it after other parts or the brain have already figured it out, and the signals that get us there are more quantifiable, except that they’re invisible to us. The machines don’t need to be able to determine whether the guy is ‘sketchy’ or not, just whether he’s going to run the light.

      And these systems are probabilistic too, so it’s not even that the AI robotaxis need to avoid all errors as much as they just need to have fewer errors than humans. For driving that’s not as tall an order as other things. Humans are really bad at it.

      Also, it’s not like free will as a concept isn’t highly debatable. There’s plenty of evidence against it existing too.

      Reply
    101. 101.

      Timill

      March 1, 2026 at 10:15 pm

      @Mr. Bemused Senior: people pay for sex chat lines on much that principle…

      Reply
    102. 102.

      Eolirin

      March 1, 2026 at 10:15 pm

      @Ramona: Because that doesn’t help you capture whether a deer has just run into the street, so you still need the visual components anyway.

      Reply
    103. 103.

      Mr. Bemused Senior

      March 1, 2026 at 10:17 pm

      @Timill: yes that occurred to me too. But aside from porn, what?

      [ETA granted, porn is a big money maker]

      Reply
    104. 104.

      Eolirin

      March 1, 2026 at 10:23 pm

      @Eyeroller: The higher level abstraction for how the brain does information processing should be something we can replicate, regardless of the way we want to think about hardware or software, the problem is if you don’t have a the right kind of hardware, you need to simulate large portions of that structure, which can be computationally inefficient to say the least. Given enough compute power, you should be able to pull it off regardless, though we may need quantum computers to get all the way there.

      But I don’t think replicating the way the human brain works, even to scale it up, is the way to a properly useful tool for humans. Understanding how human cognition works will likely help us in building better information processing systems, but we shouldn’t be trying to recreate it. We want something that does a better job at narrower tasks and that’ll mean tailoring those systems to specific use cases.

      Reply
    105. 105.

      Tom Levenson

      March 1, 2026 at 11:17 pm

      @Chetan R Murthy: I’ve been following DeLong pretty closely on this; he’s been very measured on this stuff and is smart as hell.

      Reply
    106. 106.

      Tom Levenson

      March 1, 2026 at 11:19 pm

      @Quantum man: I have not. I currently have access to the current versions of the models from the big players (absent Grok, thankfully) So, OpenAI, Anthropic, Google, and Meta

      Reply
    107. 107.

      Martin

      March 2, 2026 at 12:33 am

      @Chetan R Murthy: Human drivers don’t need to get those signals. The signals simply need to match what the human drivers are getting. This is already happening in one regard – autonomous cars get their speed limit information from GPS, not from reading the signs. But in terms of signaling and the like – time dependent things – it doesn’t happen in any capacity that I’m aware of.

      This doesn’t need to be delegated to the auto makers (as if the streets and roads weren’t already their private domain – I say as someone who rarely drives and is pleased at the end of each day to have survived another as a pedestrian/cyclist) – it can be standardized by the various parties – AASHTO, CalTRANS, US DOT etc. And there’s been a little work on this out of NHTSA, for instance but it’s policy not implementation focused. Mostly government has just said ‘fuck it’.

      Reply
    108. 108.

      m0nty

      March 2, 2026 at 2:37 am

      LLMs just love bullet points, don’t they.

      Reply
    109. 109.

      Matthew

      March 2, 2026 at 4:52 am

      As a programmer, I’m having a wonderful time with Claude. Also, I find Claude comes to do work. Last time I asked Claude and ChatGPT a complex question. Claude just tried to answer, even producing documents. ChatGPT took me on a long winding dialog about my motives with lots of multiple choice questions. I realized it was trying to keep me engaged, rather than just answering the question and maybe making other suggestions. It wants to be your trusted advisor, whether you should trust it or not.

      Reply

    Leave a Comment

    Your email address will not be published. Required fields are marked *

    If you don't see both the Visual and the Text tab on the editor, click here to refresh.

    Clear Comment

    To reply to more than one person, click the X to save & close the box.

    Primary Sidebar

    On The Road - evap - Myanmar, December 2016, Part 1 1
    Photo by eval (3/12/26)

    Election Resources

    Voter Registration Info – Find a State
    Check Voter Registration by Address
    Election Calendar by State

    Targeted Fundraising Info & Links

    Recent Comments

    • Geminid on Foreign Sports Affairs Open Thread: “Not Possible” (Mar 12, 2026 @ 7:30pm)
    • Martin on Foreign Sports Affairs Open Thread: “Not Possible” (Mar 12, 2026 @ 7:29pm)
    • frosty on Fundraising Efforts – Part 2: The Senate (Mar 12, 2026 @ 7:26pm)
    • Bupalos on Foreign Sports Affairs Open Thread: “Not Possible” (Mar 12, 2026 @ 7:23pm)
    • different-church-lady on Foreign Sports Affairs Open Thread: “Not Possible” (Mar 12, 2026 @ 7:23pm)

    Balloon Juice Posts

    View by Topic
    View by Author
    View by Month & Year
    View by Past Author

    Featuring

    Medium Cool
    Artists in Our Midst
    Authors in Our Midst
    On Artificial Intelligence (7-part series)

    🎈Keep Balloon Juice Ad Free

    Become a Balloon Juice Patreon
    Donate with Venmo, Zelle or PayPal

    Calling All Jackals

    Site Feedback
    Nominate a Rotating Tag
    Submit Photos to On the Road
    Balloon Juice Anniversary (All Links)
    Balloon Juice Anniversary (All Posts)

    Fix Nyms with Apostrophes

    Outsmarting Apple iOS 26

    Balloon Juice Mailing List Signup

    Order Calendar A
    Order Calendar B

    Social Media

    Balloon Juice
    WaterGirl
    TaMara
    John Cole
    DougJ (aka NYT Pitchbot)
    Betty Cracker
    Tom Levenson
    David Anderson
    Major Major Major Major
    DougJ NYT Pitchbot
    mistermix
    Rose Judson (podcast)

    Site Footer

    Come for the politics, stay for the snark.

    • Facebook
    • RSS
    • Twitter
    • YouTube
    • Comment Policy
    • Our Authors
    • Blogroll
    • Our Artists
    • Privacy Policy

    Privacy Manager

    Copyright © 2026 Dev Balloon Juice · All Rights Reserved · Powered by BizBudding Inc

    Insert/edit link

    Enter the destination URL

    Or link to existing content

      No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.
        Share this ArticleLike this article? Email it to a friend!

        Email sent!