• Menu
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Before Header

  • About Us
  • Lexicon
  • Contact Us
  • Our Store
  • ↑
  • ↓

Balloon Juice

Come for the politics, stay for the snark.

The way to stop violence is to stop manufacturing the hatred that fuels it.

Do not shrug your shoulders and accept the normalization of untruths.

To the privileged, equality seems like oppression.

The Giant Orange Man Baby is having a bad day.

Shallow, uninformed, and lacking identity

How stupid are these people?

… pundit janitors mopping up after the gop

This year has been the longest three days of putin’s life.

There are consequences to being an arrogant, sullen prick.

Come on, man.

Republicans choose power over democracy, every day.

Something needs to be done about our bogus SCOTUS.

Cancel the cowardly Times and Post and set up an equivalent monthly donation to ProPublica.

Weird. Rome has an American Pope and America has a Russian President.

Following reporting rules is only for the little people, apparently.

We’ve had enough carrots to last a lifetime. break out the sticks.

Nothing says ‘pro-life’ like letting children go hungry.

So very ready.

The press swings at every pitch, we don’t have to.

Trumpflation is an intolerable hardship for every American, and it’s Trump’s fault.

Live so that if you miss a day of work people aren’t hoping you’re dead.

Disagreements are healthy; personal attacks are not.

Of course you can have champagne before noon. That’s why orange juice was invented.

There are times when telling just part of the truth is effectively a lie.

Mobile Menu

  • 2026 Targeted Political Fundraising
  • Donate with Venmo, Zelle & PayPal
  • Site Feedback
  • War in Ukraine
  • Submit Photos to On the Road
  • Politics
  • On The Road
  • Open Threads
  • Topics
  • Authors
  • About Us
  • Contact Us
  • Lexicon
  • Our Store
  • Politics
  • Open Threads
  • 2025 Activism
  • Garden Chats
  • On The Road
  • Targeted Fundraising!
Hamilton!

Guest Posts

You are here: Home / Archives for Guest Posts

BretH, Taking a Leap of Faith in the Early 1980s

by WaterGirl|  March 21, 20262:00 pm| 43 Comments

This post is in: Adventures of BretH, Authors In Our Midst, Guest Posts

I’m not quite sure whether this is an Authors in Our Midst post or a guest post from BretH who was inspired to write it after his earlier story for us a few weeks ago about his time in New Orleans.

Either way, it’s a long story, so we’re breaking it up into 4 pieces, like a serial publication in the olden days.

From the first photo, I could swear that I knew BretH back in the day – so I have to ask, did we all know someone who looks just like BretH?

Becoming a Motorcycle Messenger in the Early 1980s

by BretH

There are times when events, circumstances and technology come together to create opportunities that peak, then fade and afterwards could never again be possible. This is a story set in one of those times, in the heyday of messenger services in Washington DC, long before cell phones, before e-mail, and even before the fax machine. For a few magic decades the growth of business and Government in the nation’s Capital coincided with improvements in radios and motorcycles to create a unique workplace for thousands of delivery riders, young and old. I was fortunate enough to have been a part of it.

It was 1981 and having recently returned from a time in New Orleans (link to previous post), I was living in Takoma Park, just outside of Washington, DC in the basement of the house I grew up in, and nearing the end of my second stint in college. I had left the South because life in New Orleans, while absolutely memorable, was quite chaotic, and I had soured on the city, its racism and antebellum past, and the dirt and heat, palmetto bugs and the dank smell that seemed to sit over the entire area for days on end.

I had decided to give college another go, this time at what I considered a soulless monstrosity: the University of Maryland. Unfortunately, once there I faced the same problem that had made me leave Antioch—I couldn’t really settle on what I wanted do with my life and was feeling the pressure of trying to figure that out in college. My best times were actually the five mile bicycle rides to and from school and hanging out in the quad with friends; the classes I largely forgot. After two semesters there, where I doggedly pursued an English degree with rapidly diminishing expectations it would ever amount to anything,

I was ready for another break and started checking the help wanted ads in the Washington Post (a real paper in those days). I couldn’t help noticing the several columns of ads for messengers—car, motorcycle and bicycle. The pay seemed great (later I would realize that those were the absolute top amounts that could possibly be earned by a great messenger who had a lucky week) and the ads offered flexible hours and no experience needed. I had been seriously riding and fixing bicycles since I was about 14 so that seemed like a job I could do well at—but for reasons that I can’t explain today outside of a sense of adventure and self-confidence gained by living in New Orleans, I decided I would become not a bicycle but a motorcycle messenger.

There were really only two things in the way of realizing this goal: I did not actually have a motorcycle, and aside from riding mini-bikes at friends’ houses I had no experience in riding one. But I had learned to drive on stick shift cars and as I said, was a very experienced bicycle rider so I resolved to not let that get in my way.

show full post on front page

Back in the classified ads I saw many motorcycles for sale but really only one I could afford at that point. So in a borrowed pickup truck I drove out to a rural area and found waiting for me a cute little motorbike with a sissy bar and raised handlebars and a large steel crash bar up in front of the engine to protect it (and me) in case the bike went down. It was rather like someone grafted chopper parts onto a petite Honda. Riding that little thing out the long dirt driveway to the road and back was not as hard as I had feared, the motorcycle seemed to run OK and did not appear to be missing any parts, plus the crash bar seemed like a really good thing, so we exchanged cash and I drove away with it in the back. A couple weeks later I had my license, a helmet, and had gained some basic experience riding it and was ready for the next step.

Interestingly enough, my best friend also wanted to become a motorcycle messenger, as I think he was graduating from the same University, having not taken a break like I had. Now Pete was in my eyes an experienced rider and actually had a Triumph Bonneville 750, a rumbling beast of a bike that left little oil droplets wherever it was parked. Together, we decided to try for one company called “Speed Service” that explicitly wanted motorcycle riders, because they offered a salaried position instead of paying by commissions from the jobs and had Workman’s Comp, which we figured might be very necessary at some point. We called and were told to show up the next Monday morning at 8am for a tryout.

Full of confidence and trepidation we headed into town that Monday and encountered what could have been a good or bad omen. Waiting at the front of traffic at one stoplight we were startled to see a small motorcyclist weave in between the lanes to join us at the front. We were doubly surprised when we found out it was a woman and she was a messenger herself. We declared we were just starting that day, and found out her husband was also a Speed Service rider. She wished us luck, which I realize now was more like “good luck, you’re really going to need it” as she had eyed what we were riding: Pete on his Triumph and me on my weird Easy Rider Honda.

We showed up at the appointed time to the little building on 14th street just above Logan Circle (then notable for the ladies of the night that prowled the area) and were given a short overview of the job and what was expected of us during our training period. Which in a nutshell was: follow your trainer, listen to them like they were the Voice of God, do everything they do in traffic and try your best to keep up because they are not going to wait for you. So we set off, Pete with one trainer and me with Nick, one of their senior riders, who had a most unusual motorcycle (at least to my novice eyes): a 500cc single cylinder Yamaha, slim as an arrow and outrageously fast off the line—really a perfect messenger motorcycle as it turns out.

I believe we had gotten no more than ten blocks away from base when Nick put on a burst of speed and I found myself behind a slow car terrified I would lose him. So in desperation I passed the car on the left, going into the empty oncoming lane to do so—when the car abruptly turned left into me, heading for a parking lot. I bumped the side of the car, went down and skidded to the curb, shaken but somehow unhurt. The crash bar had indeed done its job and although it was bent, the bike itself was also mostly undamaged except for a rear turn signal that was now hanging by its wires. The driver, who had not signaled his turn and was late to work, saw that his car was basically unscratched and accepted my apology and left me. Nick saw what had happened and after briefly checking to make sure I was in one piece basically said “you’re on your own now, kid” and continued on his way.

Utterly distraught and feeling like I had blown my one chance, I had no idea what to do except to limp back to the office in shame. I pulled into the back room which doubled as a work area where I was met by the owner of Speed Service, Big John himself, and Bruce, an ex-rider, dispatcher and general handyman, neither of whom were in the least surprised to see me. I confessed what had happened, expecting to be fired on the spot, and they asked me the fateful question: would I be willing to give it another try? And there began my real education of how to be a motorcycle messenger.

“First off, we gotta do something about your bike.” We got my motorcycle up on the work platform where two of them looked it over and tut-tutted and reached for the socket wrenches. Off came the sissy bar. Off came the raised handlebars and Bruce rummaged about in bins until he found a replacement pair, barely raised and cut down a little to be narrower than stock. New bars on, they turned their attention to the crash bar and it was unceremoniously removed. I was told that they had seen me initially pull up and knew the bar would be a problem because in order to do the job the bike had to be as narrow as possible because it was absolutely essential to “split lanes” as we had seen the female rider do earlier. If I went down from now on I might expect to scrape the tank or the sides of the engine but apparently that was not considered anything I should be concerned about.

A little about my particular motorcycle. It turns out that I had unwittingly gotten a pretty good bike, well suited for the job. It was a Honda 400cc four-cylinder Super Sport, which these days is quite the collectors item with die hard fans who truly appreciate this early little sport bike. The engine buzzed like a sewing machine as it was happiest running at high revs, and had a six-speed gearbox which, once you knew how to work it, made the bike a snappy little thing. The exhaust was unique, a glorious four-into-one that was really a work of art. The SuperSport was pretty much bulletproof as well, with the only weak spot being a tendency to eventually leak oil from the cylinder head gasket as the air-cooling was lacking for the middle two cylinders.

My newly stripped-down motorcycle made ready I was told to come back the next day and Nick would take me out again. That I did, and whether he was told to be more mindful of me or whether the initial trial by fire and my reluctance to quit signaled a seriousness that he respected, apart from being a top-flight courier, Nick proved to be a patient and detail-oriented teacher. And I hadn’t realized just how much I would need to learn.

 

BretH, Taking a Leap of Faith in the Early 1980sPost + Comments (43)

Guest Post – Tony Jay: Note From Brexitania: Early Winter of 2026 Edition

by WaterGirl|  February 4, 20262:30 pm| 152 Comments

This post is in: Foreign Affairs, Guest Posts, Open Threads, United Kingdom

It’s our lucky day – Tony Jay has a little something to say about what’s going on in the UK.

Looking for a short pictorial summary?

Guest Post – Tony Jay: Note From Brexitania: Early Winter of 2026 Edition

A LETTER FROM BREXITANIA

Our State Is One of Complicity

by Tony Jay

 Once upon a time, in what feels like the long, long ago, I had a job that wasn’t so much a ‘job’ as a place to be with a laptop and a chair. For various reasons that would bore the tits off a gravid sow I was paid just to grace the location with my presence five days a week, which explains the series of very long, very self-indulgent rants I was able to produce for your good selves detailing my personal perspective on the accelerating awfulness of Brexit Britain and the political scum-crust responsible for imposing this thunderclap of fuckery upon us. Those days are gone now, thankfully, and I have a different job that keeps me busier than a fluffer at a Bonnie Blue book signing, so finding the time to spew satisfying amounts of profanity about the UK’s condition is harder than landing a knob joke at a Bobbitt family reunion. But it’s my day off, it’s a whole new year, and the world around us is fresh and supple with endless possibilities, so before The Stench of Pennsylvania Avenue fucks things up for everyone (too late!) I’m going to see how much whine I can decant in the space of two hours. Attendance isn’t compulsory, but like tipping, it’s expected of the better class of person.

Let’s start easy, with a little light racism!

 Silken Folds Of Purest White

Slashed With Red, Cruel Delight

Flags Half Masted, Ugly Sight

Such Brazen Challenge, From The Right  

The Frog says the water is just fine

 For pretty much my entire life, the British Right’s war on all the good parts of the last three centuries has been a background presence. Not always that obvious, not always that effective in advancing its aims, but always there, like a stalker’s backlit silhouette, or the smell of lubricant in a creepy teacher’s car. It’s been a sibilant whisper in the nation’s ear, offering Othering and finger-pointing faux outrage doused liberally (but not, of course, Liberally) with the cheap cologne and swaggering ignorance of Laddish ‘common-sense’, delivered daily to café breakfast tables and free-paper bus commutes all across the land. The dial has been noticeably turned up since the self-inflicted disaster of Brexit, as 70’s style open racism has been escorted back into mainstream discourse to fill the urgent need for a tried and tested method of redirecting the growing panic of a General Public increasingly unsure of its footing amidst the rising tides of Corporate Supremacism away from the sensation of smooth white hands rooting around in their pockets, and towards some of the more traditional and acceptable targets for Rightist fulmination.

show full post on front page

Mostly, it’s taken the form of a relentless media campaign across every cultural platform, pouring steaming hot race-baiting, sneering misogyny and non-Cis fearmongering across the top of our national dish, a jumbled up hot-pot of familiar tinned staples and fresh new flavour combinations drawn from around the world, in which all the roughly-peeled vegetables sunk below our increasingly affluent but rapidly shrinking Upper Class have been finely julienne-sliced into siloed slivers a la crabs l‘bucket, and the meatier ingredients are a mix of chunky, slow-cooked ‘Working Families’ thickened up with a minced-up ragu of ‘Workshy Benefit Scroungers’. Very occasionally, when the temperature hit boiling, there’s been the odd opportunistic foray into open violence, but mostly, by design, it’s been an insidious drip-drip-drip of poison into the cultural jus, a toxic aftertaste most don’t even register as out of place because it’s what they were brought up on. It’s ebbed and flowed in intensity and focus with the beat of societal flux, sometimes a full-on, foam-flecked howling for the blood of aliens and deviants, but most of the time couching its raw-milk racism in the seaside postcard populism of Page 3’s Jessie, 23, from Devon, who wants us to know that she’d be ever so excited if all you patriots joined her and her enormous boobs in supporting England’s (and it’s always England’s) Boys in whatever footy tournament they’re about to get knocked out of by the Germans Portuguese bloody Italians.

Dangerous and ridiculous at the same time, but yeah, it’s always been there.

Nowadays, following the stunning success of its longshot Brexit offensive (tagline – Europeans took our country!) which exploited long-standing Tory divisions and the desolate end product of four decades of Thatcherism to drag us thrashing and squealing out of the EU – breaking, in the process, all of our economies, including the political one – the British Right’s campaign has pivoted to blaming all of this small, isolated country’s rapidly accumulating post-Brexit ills, real and imagined, on the made-up threat of Small Boats and Uncontrolled Illegal Immigration (tagline – brown people took our country!). They’ve managed to successfully repurpose whole swathes of social-media from forums for make-up tips, amateur porn and football banter into an unregulated firehose of racist misinformation, plus make-up tips, amateur porn and football banter, with the algorithms weighted towards convincing swathes of the left-behind and under-educated mass of white England that oh, no, no, no, despite the unarguable gravity of the situation they find themselves in, it really would be best for everyone if they didn’t choose to nut-up and take a stand against the rapacious greed of a socio-economic paradigm that has reduced their communities to immiserated hellholes of grinding poverty and cultural deprivation in order to pad the already swollen bank accounts of jet-setting billionaires, but rather looked… Over There! Towards the endless stream of AI generated clips of browns doing bad things and accompanying infinity of AI generated slop about Political Correctness Gone Mad.

Do that for long enough (call it the Winston Smith Treatment) and it becomes obvious that the problems most people face in their daily lives; problems like all of the local shops on their High Streets being boarded up, like it being increasingly hard for them to find work that pays a decent wage, like everything on the shelves being both low quality and costing three times more than they did a few years ago, and like the basic social amenities that their parents and grandparents could rely on to make lower-class lives in Britain’s towns and cities liveable all being stripped away by wave after wave of local government cost-cutting are all, in fact, nothing at all to do with rich people hoarding all of the money, but are solely the fault of Lefty do-gooders and their racist plot to impose Venezuelan style Sharia Law on Heritage Britons by flooding their communities with brown skinned, BMW-driving rape-gangs funded with cash stolen from British taxes by a conspiracy of Woke Charities and The BBC!

Obvs. Yeah? Totally. Totally obvs. Equally obvs is that you’d have to be a titanic fucking moron with a brain the consistency of cheap Brie to believe any of it. Or, maybe, you’d just have to be one of the huge majority of people who “Don’t really follow politics” outside of knowing “They’re all as bad as each other”. The kind of people who get all of their geopolitical ‘facts’ second and third hand from news bulletins playing in the background while they browse Tik Tok, and from all the outraged comments they devour on their ‘neighbourhood’ Facebook groups, the ones run out of Moldovia by a guy who – almost certainly – keeps on losing child custody hearings because of his three convictions for assault and malicious harassment, and because he refers to himself in court as ‘Grimhelm, Trueblood Chieftan of the New Rohirrim’, topless, while sporting questionable tattoos he says he got in Croatia while drunk.

Yes, they know things are fucked up and shit, and they know that someone is to blame, and they’re even dimly aware that they can’t draw any drafts of icy cold truth from the shallow waters of Tabloidland, but the Internet? The Internet wouldn’t lie to them, would it? The people telling them how everything is the fault of Migrants and that only Mass Deportations will Save Our Kids from paedophiles might be bitcoin millionaires with a sideline in smuggling luxury cars out of Bulgaria, but they look and talk just like them, not like those Oxbridge educated wankers on TV with their £1000 haircuts and shiny suits. So, all of the directionless anger and barely contained fear that things for people like them are only going to get worse, quite reasonable anger and justifiable fear that could, if properly addressed by progressive elements with a plan for raising everyone’s boats, seriously threaten the entrenched privilege of the status quo, get harnessed instead into the service of people far, far richer and way more out of touch than even the slimiest politician.

Or, yeah, to be less charitable, it could be that the 30% to 40% of the country that keeps on voting for the worst option available are just a bunch of cowardly fucktards who know full well that their world is imploding, want someone to blame, but because taking a swing at the rich and well-connected types busy partying in safety behind their security gates is a genuinely scary prospect, they prefer to down a few lagers, snort something chemical, and gather in mobs to scream threats into the faces of brown mothers just trying to get her crying kids to school. After all, not judging yourself and just saying the worst thing that comes to mind is what being 21st century British is all about, innit?

Anyway, who the fuck knows. I’m not a sociologist, I can barely spell it. What I do know is that in 2024 this throbbing geyser of vitamin-free bile gave us a Summer of Violence; one that might have started with hordes of tattooed thugs in Union Jack masks jiggling their pasty white moobs to the tune of Rule Britannia outside of the many run-down former hotels where most asylum seekers have been forcibly ‘concentrated’ by successive Tory and Hard Labour Governments – at great expense to us and great profit to the wealthy donors who own these properties, natch – but which culminated in half-arsed but still terrifying Race Riots that left the poorest parts of some cities – mine included – trashed, and many of our non-white communities traumatised. It was such an ugly ruckus that it roused even the somnolent Sir Kier Starmer from his comfortable spot curled up and purring at the foot of the Ministry of Mediocrity’s towering lobby statue, the one depicting a herculean Tony Blair holding up the severed heads of Socialism, Satire and Meritocracy while declaiming Pecunia Causa Id Feci to an assembly of overweight felines. With his AI-generated speech in hand, our Prime Minion shuffled uncomfortably in front of the nearest camera, set his facial slopes to Mildly-Furious, and droned on in the dullest way possible about how his Government wouldn’t stand for this kind of thing, and would inflict gloomy vengeance and faded anger upon any personage of low account found to have disturbed the stale and greywashed peace of Austerityland 2.0.

Middle-England was pretty repulsed by the footage too, and the opportunity was certainly there for any Government willing to take the fight to the people directly responsible for mainstreaming this hate and use that as a place to begin unifying the country around the simple principle of Not Being Dickheads, but we don’t have one of those Governments, we have Starmer and his clutch of perpetually u-turning wannabes. All the inciters (for which read Nigel Farage, Stephen Yaxley-Lennon, the vast majority of our Media organs and Elon Musk) got away with it. A few morons caught up in the thrill of smashing up their own streets to preserve ‘England For The English’ getting banged up for six months was an infinitesimally small price to pay for advancing the Overton Window a few miles to the right. Those droolers could always be repackaged later on as political prisoners and martyrs to the cause if it was deemed necessary, but the important thing was that notice had been served to anyone watching of how much anger there was out there, and how easily and quickly the Far Right could muster up a Moron Militia in anyone’s backyard.

But still, while that kind of in-your-face mob violence might be thrilling in small doses, and the intimidation factor is not to be sneezed at, too much of it gets in the way of the Far Right’s preferred – and currently very successful – tactic of pushing at the already half-open door of newnewlabourinc’s Blue Labour influenced policies on Immigration and other Kultur War issues. The aim is always to chivvy Sir Keir’s rootless centre-Right regime ever Eastward Ho in pursuit of that mythologised voter who is ‘culturally conservative but not actually comfortable with open racism’ his Blue Labour advisors so very, very much want to appeal to. The more they can push Starmer and Co into reacting to Far Right outrages with old-school triangulation and focus-grouped statements about “recognising people’s genuine concerns”, the more the Far Right benefits from the clear implication that newnewlabourinc agrees with the Right on the importance of the issues, but just aren’t up to the job of dealing with them quickly or adequately. The last thing Nigel Farage and his clutch of froggy fascists want is to put newnewlabourinc’s reactionary careerists into a position where social disorder on the streets paints them into a corner where they have to crack down or else risk revolt on their own back benches.

With this in mind, what appears to have happened as 2024 bled into 2025 was that the cash-rich spinfluencers of the modern Far Right had a retooling of their hateful messaging and decided they’d be far better off cooling it with promoting anti-foreigner pogroms and should, instead, launch a new summer campaign, one that co-opted the look, ethos and tribal yobbishness of 2000’s-era football hooliganism – a period looked back on fondly by some of today’s aging Gen-Xers – and rebranded it as an authentic grassroots expression of jovial national pride and Bulldog Bringlish unity in the face of, a) THE MIGRATION CRISIS™, but also b) whatever social or personal problems the people they’d successfully radicalised online might be hung up on that could be triggered to get them up on their feet and screaming obscenities at passing non-whites.

First, they quietly organised a campaign of urban flag-tagging. It makes sense as a tactic. Why risk triggering Police interest by orchestrating gatherings of ill-disciplined morons with criminal records (something like 60% of those arrested outside of asylum hostels chanting about ‘protecting our women and kids’ had form for domestic assault at the last count, 60 fecking percent) when instead, you can just pop an order for eight thousand St George flags and Union Jacks into DHGate and send a selection of the same morons trundling around Britain’s boulevards laying claim to everyone’s eyeline? A few cherry-pickers operating in the middle of the night handled the majority of the work, strapping Union flags and St George crosses to lampposts, leaving whole High Streets looking like there was a Royal Visit in the offing, while for propaganda purposes, and to give local press outlets a story they could run with, the smaller-scale displays of guerilla gobshittery in harder to reach areas were franchised out to local oddballs with time on their hands and nothing between their ears.

Picture the scene, Wayne (late 20s, cobweb tattoo across half his neck, owns five identical tracksuits, has four kids by three different mothers, so pale and bony he could find work as an anatomical skeleton) pushing a wheelbarrow piled with red, white and blue tat from which others fan out with tins of red and white paint to transform convenient roundabouts into Bigot Bullseyes, while Twitchy Mark (late 50s, occasional shelf-stacker, spent some time in prison in the 90s, knows an awful lot about the divisional heraldry of the Waffen-SS) shambles along with a rickety ladder over his shoulder (but not on camera, because while he might be absolutely furious about Those People leeching on Our Taxes he’s still trying to claim disability benefits for a bad back) stops under each flickering streetlamp and braces the ladder so that Wayne can shimmy up and down it draping their run-down estate with Bigotry Bunting. There are an awful lot of lampposts in England, every one of which has a midpoint just begging to be home for a primary-coloured warning that ‘we’ don’t like ‘your kind’ around here. And there are a lot of Twitchy Marks and Waynes too. Educating yourself about how the world really works and not beating up your girlfriend is hard work, but flag-lynching? That’s proper easy.

So that happened, and while people were still nervously deciding whether it was safe to pull down the bigot banners without risking a punch to the face (in many areas, it wasn’t, but the Media wasn’t interested in those insignificant details) and a parade of newnewlabourinc spokescreeps, including the Supine Minister himself, were stutteringly affirming how absolutely splendid this outbreak of not-at-all racist patriotism was (“I love the flag! I’ve got ten at home! I have one over the bed so that I can pleasure my wife patriotically! My OnlyFans sign in account is Hot4theflag2025! Please, love me!”) we got to ‘enjoy’ the Far-Right’s crowning moment of not-awesome, a genuine Fascist march in central London. About a hundred thousand gobshites clogging the streets, bald heads gleaming and beer-bellies bulging, spitting at the Police and pissing all over the sacred monuments to British History their social media profiles would have you believe they feel must be protected from Antifa vandals and the opinions of foreign-loving intellectuals.

Truly it was a sight to see. Like Jan 6th being run out of the back room of a Folkestone BnB by the Mirror Universe cast of Dad’s Army. Tacky and naff and absolutely oozing with fist-clenched rage. The hands down ‘funniest’ bit was when Temu-Hitler wannabe Stephen Yaxley-Lennon (Luton-born, Spain-based, Irish passport holding, US/Russian funded chickenshit neo-Nazi) who the Media resolutely insist on calling by his ‘one of the lads’ stage-name of ‘Tommy Robinson’ (because his own name is clearly a bit too middle-class public schoolboy for the Jason Stathamesque image he wants to project) stood on a stage in central London alongside the grotesque live-streamed head of Elon Musk (after said head had just called for civil insurrection and the violent removal of the British Government – oh, that Elon, what is he like? LOL) and joined the mob in chanting “We are all Charlie Kirk”, which was lovely to hear.

  • Popular only amongst a tiny minority of the country.
  • Inflated by Media hagiography beyond any possibility of long-term influence.
  • Doomed to be shot to death by a fellow-traveller over doctrinal differences.

Make it so, fash-trash. I’m so down with that I’m 2D and horizontal. It’s just a pity the whole sorry slurry of them couldn’t have lived that dreamiest of dreams right there and then. Jonestowning their own filth out of the national gene-pool live on camera and leaving the future considerably brighter for those of us left to mourn the fact they couldn’t have done it sooner.

Sadly, it wasn’t to be, and London’s taxpayers were left to pick up the tab for hosing down and disinfecting their streets in the aftermath. A portion of that bill could probably have been made up if the Metropolitan Police had arrested some of the scumbags while they ran riot and issued on-the-spot fines, but that’s not how The Met rolls. Its armoured-up officers are always too busy playing billy big-bollocks while arresting hundreds of peaceful anti-genocide protestors to risk their delicate foo-foos confronting actual anti-social thugs, especially when said thugs have milky white skin and share The Met’s animus towards melanin-hoarders. The only black faces I saw in the melee were either trying to get away from the mob ASAP or else were capering tokens in their Union Jack suits, pushed to the front for the Media’s attention before being bussed up to Jockland to welcome The Pustule outside Balmoral. Funnily enough I don’t remember a single journalist pointing out this very clear message about the kind of people who actually wanted Donald fucking Trump to come to the UK. Maybe I just missed it.

Anyway, the saddest thing about all this was the Government’s response. Der Starmerpartei couldn’t decry the mass flag-shagging, not after this New Changed Labour Party™ had adopted Flobalob’s ‘twin flags framing a pile of shit at a podium’ stylings wholesale to dictate that every Government Minister and spokescreep now has to wedge themselves between at least one pair of enormous Union Jacks whenever they’re saying something ugly and indefensible on TV, leaving them looking like they’ve sprouted oversized patriotic wings, or they’re cosplaying the episode of Mr Benn where he visited the costume shop and picked the ‘Japanese Bannerman Let Loose On ‘60s Carnaby St’ outfit. They certainly couldn’t push back after their response to the flag-tagging craze of a week before had been another predictably sweet surrender to Rightist tokenism. Hard Labour is the face this Government shows to Lefties and minorities and anyone else who doesn’t think ‘cutting handouts to spongers and slashing red tape for business’ is the sexiest new idea since the starched ruff. When it’s Far-Right thugs spreading fear and hatred with Freikorpsitarian abandon they can’t roll on their backs quickly enough, tongues lolling out and big Disney eyes just begging the morons for that electoral scratch on the tummy they’re never going to receive.

Once again, the Britain that actually voted for Starmer’s Opposition as the least-worst option at the last Election waited for Starmer’s Government to do something, anything, that vaguely looked like meeting the moment, and once again, they were left as disappointed and discouraged as Katie Holmes on her wedding night.

At the end of the day, Starmer’s Government couldn’t respond properly to the aggression of the Right, because Starmer’s Government didn’t want to. When normal people looked at that shitshow in London, they saw a disgusting mob of violent idiots equating British and English with Whites Only thuggery, but when people like Morgan ‘Twat’ McSweeney’s backroom coupists and Baron Maurice Glasman’s Blue Labour big-thinkers, the ones who give the careerist Labour Right their intellectual heft (LOL, thewhatnow?) watch the very same footage, all they can see is an energised and easily gulled electorate they’re desperate to appeal to because, whatever else they are, they’re not Leftwing. So, instead of a call to arms and a stirring refutation of neo-Mosleyism, we got the sorry sight of Starmer and Co nervously applauding the racists for, as they put it, ‘proving that this Government has made the UK a bastion of free speech”, even while the orchestrated chants of ‘Keir Starmer is a Wanker!’ trended on everyone’s stream.

Sigh. The one thing everyone can agree on. Keir Starmer really is a wanker.

Speaking of Der Starmerpartei (sadly, I must, they’re the frigging Government) it will surprise no one to hear that they’ve been failing harder than ever since I last vented. You must all have better ways to spend the next few weeks, so why don’t you go do that and come back when normal life just isn’t glazing your cherry? I’ll still be here. 

Incidents & Accidents, Hints & Allegations

When Sir Keir Starmer first elevated Peter Mandelson to the post of UK Ambassador to the United States of America, reactions were, to put it mildly, pretty varied. To some of our more cynical Press lifeforms and to newnewlabourinc insiders (many of whom had come through the ranks of ‘Progress’, Mandelson’s REMF training school in the dark arts of factional fuckery and political wetwork, and as a result considered him something of a mentor and a morality free exemplar of the canny operator) it was a stroke of genius, a cunning throw of the dice that might, if the ever-oleaginous Mandy could charm and flatter the Oval Ogre into sparing Great Britain from the worst of his tariff tantrums, pay off big time. More or less everyone else in the country who ventured an opinion called it an astonishingly in your face example of the Labour Right’s hard-wired obsession with keeping all of the spoils of victory in-house, and of their crippling weakness for displays of oh-so clever-clever One Cool Trickism, the kind of smart arsed sleight-of-hand that might sound tremendously savvy when cooked up around an Islington dinner-table over a couple of bottles of wine and a line or two of coke, but deflates like a first-time flasher’s penis on exposure to the real world.

Funny thing though, for both schools of thought, their reasoning was exactly the same.

You see, Mandelson, the Blairite backstabber long ago nicknamed ‘The Prince of Darkness’ by people who actually sort of liked him (what does that tell you?) had, at this point, already made himself a couple of very successful careers out of wrapping his talents around the genitals of VIPs and whispering flattering bon mots of advice into their ears that, somehow or other, always ended up benefitting Peter Mandelson and various friends/clients of Peter Mandelson. Serially disgraced and forced to resign as a Labour Minister not once, but twice, he’d naturally moved into the world of lobbying and put his skills to work helping very rich people understand, circumvent, and exploit the global regulatory landscape by, basically, telling them who to bribe, with what, and for only a cursory extra fee syncing up everyone’s diaries to make the deal happen just when the best caviar was in season at the nearest Michelin starred restaurant. He was also, like Trump, a part of the whole Epstein circuit. He and the Prince of Perv Island were proper good chums, close enough for Mandelson to arrange a meeting between Epstein and then Prime Minister Blair in 2002 to discuss, well, that would be a secret, wouldn’t it, dear hearts, and also for Mandelson to lodge at one of Epstein’s penthouses in 2009 while its owner was in jail for (checks notes) soliciting underage prostitution. ‘Petie’ and his ‘best pal’ appear to have been very, very close friends. Soulmates, if you will.

Now, bear in mind, the decision to make Mandelson US Ambassador was taken after Epstein’s death, after the former Prince Andrew’s comfortable sinecure as Royal-For-Sale had crashed and burnt on impact with the truth about his own Epstein connections, and long, long after the Epstein name had become a pop-culture byword for sleezy paedophilic influence peddling. It was made when the people in the room discussing the appointment knew full well that he and Epstein had been close, knew that Mandelson had been a frequent guest at Epstein properties, and knew that he’d sent Epstein supportive messages during his trial and after his conviction. To think that making a person like that Ambassador to the US was a good idea, you’d either have to think that the Epstein affair was done and dusted and would never be allowed to raise its head again (sort of possible in 2025, I’ll allow) or you’d have to think that the dirty secrets beneath the Mandelson/Epstein/Trump connection would actually prove an advantage. A sort of sotto voice acknowledgement that the UK Government wasn’t interested in what His Presidential Majesty may or may not have got up to while palling around with paedophile rapists, because, hey, just look who they’d sent to represent them at his Court. Nudge Nudge, Wink Wink, say no more.

Either way, this decision paid off just like a huge proportion of the other decisions made under the aegis of Starmer’s feral Chief of Staff, Morgan ‘Twat’ McSweeney, by which I mean in a colossal and very public fuck-up that smeared pickled chicken ovaries all over his Boss’ immobile face and, to the delight of many and the despair of very few, left Peter Mandelson once again disgraced and forced to resign. Turns out, McSweeney’s vetting process when pushing Mandelson for the job had boiled down to ‘Peter wants, Peter gets’ and he’d lobbied Starmer to push through the appointment before the official vetting had even finished, while the career civil servants whose job it was to warn Ministers when they’re making easily avoidable faceplants had still been expressing their strong reservations about putting someone like Peter Mandelson into a post that left the Government exposed if there were any further revelations. Revelations just like the kind Bloomberg would leak into public view later in 2025, including incriminating details about the depth and length and intensity of Mandelson’s continued relationship with Epstein that he’d either lied about, Starmer and McSweeney hadn’t asked about, or they’d asked about but didn’t care enough about to let it influence their pre-made decision.

Once it blew up, Mandelson, being about three DNA chains removed from a Facehugger, tried to style it all out, responding to Press questions about Epstein with his usual shrewish venom and soliciting a “full confidence” statement from Starmer in Parliament, but it was all for nothing. Petie was obviously done and resigned the next day. Clean-up on Aisle 3, there’s Mandelscandal all over the floor again and we need a crew with stout brushes and buckets of soapy vinegar down there stat to sing yet another chorus of “We had no idea there were any skeletons hiding in the cupboards of Castle Everything-Here-Is-Made-Exclusively-Of-Bones”.

It got worse, of course. It always does where Peter fucking Mandelson is concerned. After a few months keeping his oily head down and waiting for the dust to settle, ‘Mandy’ obviously thought he could remind Trump and Co of what a charming and very useful fellow he was by informing the British Media that Europe’s response to The Orange One’s interest in annexing Greenland was ‘histrionic’. Quite why the British media thought it necessary to share Peter Mandelson’s foreign policy opinions with us is a mystery (it’s not – Mandelson is an important figure on the Labour Right with a presence in all the rooms where Der Starmerpartei makes its decisions, when he speaks, you hear the opinions of the Donor Class) but as they were already interviewing him as part of his reputational reset tour (basically he’s so sorry, Jeffrey who? Let’s put the past behind us, eh?) the savvy understanding was that Mandy would obviously wrap the rotten egg of his return to influence in sweet, chocolaty contrarianism, all the better to signal to the upper echelons of the British Government that he was a Trump-friendly asset they could not afford the bother of freezing out.

It probably would have worked, too, had the latest tranche of mostly redacted Epstein material not included e-mails and photographs that not only underlined, with big, thick sharpie strokes, not just how close Mandelson had been to Epstein for years, but also provided evidence that he had been a mole for Epstein (and the interests that used Epstein as a source of profitable information) within the UK Government for decades, leaking highly confidential secrets about the UK Government’s economic plans and advising foreign financial interests on how to threaten and coerce the UK Government while he was a UK Government Minister!

Yeah, this time, the stake has been truly hammered home. There’s no way even Mandy can slither back from this disgrace. But for me (and I’m not alone in this) the real scandal is that Peter fucking Mandelson was ever in a position to be this scummy. There’s a line in one of the e-mails from 2012 that absolutely sums up who Mandelson – and by extension the entire Labour Right/Labour Together cabal are at a very basic level. Mandelson (after his second humiliating sacking from a Labour Cabinet) was visiting an economic forum in China and described his presence there to Epstein as “trying to make an honest living”, to which Epstein replied “trying something new?” and Mandelson comes back with “Indeed”. That’s the Mandelson/McSweeney brand of ‘Changed UK Labour’ to a fucking tee. Corrupt, smarmy, self-satisfied panderers to corporate profit and authoritarian power who lie for a living and have nothing but contempt for anyone who can’t put money in their pocket or skate away from responsibility by abusing their influence. Mandelson, through his mentorship of McSweeney and behind the scenes networking for Labour Together, created the modern Starmerite iteration of Labour. His tireless fuckery helped give us the narrow Tory win of 2017, Flobalob’s ascension to power, Brexit, the mishandling of Covid, Liz Truss’ near bankrupting of the country and the soul-destroying betrayal of 2024’s promise of Change that could well lead to a Maga-like fascist Government coming to power here in the UK.

All that, and he was rewarded for it. Honoured. Elevated. Rendered infuriatingly untouchable and smirkingly victorious until cast-iron, irrefutable evidence of the decades long friendship the leadership of newnewlabourinc knew he had with a world-renowned PAEDOPHILE RAPIST AND SLAVER was revealed for all to see. That’s the kind of person Peter Mandelson always was. That’s the kind of person Changed UK Labour looks up to and takes its cues from. The only crime they give a fuck about is being Leftwing, and their entire modus operandi remains punishing that crime to the exclusion of all else.

Hissing noise.

All par for the course where this Government is concerned, I’m afraid. Hubris, arrogance, and a dash of delusional certainty that despite all of the intense and relentless hostility shown them by the UK Press since they assumed office, surely this time their bedrock commitment to neutering progressive politics for the benefit of the status-quo would finally pay off in friendly coverage. It’s a delusion their comms team (such as it is, people keep on resigning or getting sacked) clings to because they pretty much have to. The founding myth of the Labour Right’s claim to absolute power is that they are the grown-ups in a Party of idealistic fantasists, the only ones who understand how things really work and have the gravitas and credibility required to handle the Business and Media worlds on an equal footing of mutual respect. For a lot of them, acknowledging that they were never going to be allowed to sit at the Kewl Kidz table and that their mission to drag Labour to the Right only ever made them useful idiots in the eyes of the British ruling elites would break their minds. They have to keep on hoping, otherwise they’d have to admit to themselves that their Grand Plan to replace the Conservatives as the UK’s natural centre-right Party of Government was always just a suicide note written in crayon.

Speaking of which…

But Angie, Angie. Ain’t it time we said goodbye?

Angela Rayner, Deputy Leader of the Labour Party, Deputy Prime Minster and Housing Secretary, was forced to step down from all of her roleslast September following a long-running media-led scandal centring on her not being entirely 100% accurate when explaining what advice she’d received when she paid stamp duty on a second home. Strictly speaking she probably ‘did it’, and it’s exactly the kind of thing Cabinet members would have been expected to stand down for 10 years ago, but in these post-Flobalobian times it wouldn’t have registered as an unchewed radish-burp on the Corruptionometer if it weren’t for the fact that Rayner, a working-class single-mother widely seen as being on the Left of the Changed UK Labour Party, had been targeted for elimination since the very start of the Starmer Era by McSweeney’s faction of thin-necked absolutists. For comparison, Nigel fucking Farage himself was exposed in a far bigger scandal at the exact same time, with questions being asked over where, exactly, the £885,000 his officially bankrupt partner Laure Ferrari had used to buy Farage’s constituency home in Claton-on-Sea had come from (foreign donors), and if it was a transparent fraud to avoid Farage himself having to pay stamp duty on the purchase (yes, obviously). This scandal was in the news for about thirty-two seconds before the media quickly and efficiently lost interest and forgot all about it. Rayner’s case enjoyed very different treatment. With McSweeney’s slimy rope-cutters on her case, in addition to having the entirety of our right-wing Media scrabbling through her bins and knicker drawers looking for something – anything – they could nail her with, it was always just a matter of time before something was hyped up to force her out.

Now, of course, I won’t deny that I’ve had my own issues with Rayner, but a lot of those issues stem from the position she found herself in as the de facto sole remaining Lefty at the top of the Changed (for the worse) Labour Party™, and I’m temperamentally unable to ignore relevant facts just because they leave unsightly smears on the Great Glass Table of Comfortable Certainty. On the one hand, she’s often been the face and voice of the Party leadership when what they wanted said was indefensible bullshit. On the other hand, it’s pretty obvious that the reason they put her up there to say those things was because they wanted a credible Lefty buffer between them and the results of their decisions, and on the third hand, the likely reason she’d have agreed to make those statements in the first place is because she was trying to play the game by their rules – you do this favour for us, Ange, and we’ll earmark funding for the policies you actually want to push as Housing Secretary. You wouldn’t want to put those nice anti-inequality policies in jeopardy by refusing to be a team player, now, would you? Hmmm?

Thing is, Rayner’s position within the Government wasn’t really based on her being Deputy Labour Party Leader or Deputy PM, the first of which was purely a position she was voted into by Party members at the same time as Starmer lied his way into the Leader’s chair, and the second a position she gained because that was recent tradition and – at the time – Starmer’s leadership campaign was still pretending to be about restoring Party unity. Her position was based much more on her popularity within the Party and the popularity of the policy profile she’d steadily carved out in the years since Starmer’s controllers had tried – and failed – to make her carry the can for losing the Hartlepool by-election back in 2021.

Brief digression – what had actually happened back then was the newly ascendant McSweeney faction, facing its first external challenge where it couldn’t just finagle the rules to kneecap its opponents, had fucked that whole campaign up all by themselves – imposing a crap candidate over the heads of local Party memberships, making former Hartlepool MP Peter ‘see above’ Mandelson the face of the campaign, campaigning as ‘Not Corbyn’s Labour’ with a candidate who’d supported Remain in a seat Corbyn’s Labour had won twice, despite it voting 70% Leave, and this at the height of Flobalob Johnson’s ‘Make Brexit Happen’ popularity. When they inevitably lost that traditionally Labour seat to the Conservatives they tried to dump all the blame for this conga-line of ineptitude on Rayner, but it blew up in their faces, since she was in a post they belatedly realised they couldn’t just sack her from, and she’d been smart enough to keep the receipts for all the times she told them during the campaign exactly how they were fucking it up. Starmer apparently had such a wobble at the time he almost resigned as leader (I doubt it, but that’s the story they put out there later) and then he had to lead a humiliating climb-down that saw Rayner emerge from the mess with an enhanced reputation and a wide-ranging shadow policy profile.

From that moment on, the McSweeney Tendency were out to get her by any means necessary. Not just because her position as Deputy PM made her the de facto front runner in any post-Starmer leadership contest (we’ll come to that) but because for the first time since Corbyn’s election as leader and the close-run thing of General Election 2017, they’d been humiliated by a despised Lefty Trot, and if there’s one thing those behind-the-oven mould spores are good at, it’s maintaining grudges against The Left for years and years until they can act on them, to hell with the consequences. They were behind the deselection of her partner, Sam Tandy, as a Labour MP, exploiting their iron control over the Party’s bureaucracy and its vote-counting software – all overseen by then Election Supremo McSweeney’s office and current Home Secretary Shabana Mahmood – to ensure he’ lost’ his reselection vote (they did that a lot during the years of The Purge) and I’d bet my collection of erotic Joe Pesci figurines that the Media were given off-the-record briefings by ‘Party insiders’ clueing them in to exactly where she was vulnerable. After all, Rayner had discussed her financial affairs with these ‘party insiders’ when it first came up, to clear the air and seek advice. They knew exactly where the media should look.

But anyway, with Rayner out, there had to be an election for a replacement. The McSweeney Tendency would have preferred to just appoint someone loyal and obedient and less charismatic than their foundering figurehead to the post (a faded black and white photograph of a severed fish head on a pike, perhaps, or a thinly buttered strip of curling linoleum from the floor of a crackhouse toilet) but that annoying democracy thing got in their way once again. They had to content themselves with fucking over the election process itself, insisting that candidates had to gather a much larger than usual selection of MPs signatures within a ridiculously truncated period while the House wasn’t even in session, which gave candidates who already had connections within the Party machinery and the media an insurmountable advantage. In the end the only candidates to make the cut were Starmerite Education Minister Bridget Phillipson and former Leader of the House (sacked by Starmer for daring to have reservations about welfare cuts) Lucy Powell.

The whole sorry fiasco just sums up, yet again, everything wrong with Hard Labour. Powell, a solid old-school Blairite in all things who, nevertheless, seemed to understand that the Party’s right-facing drive for Tory votes is leaving it bereft of centre-left support, was painted by ‘party insiders’ as a crazy, divisive Marxist who was nothing more than a mouthpiece for their current Bête Noire, Manchester Mayor (and likely future Labour leader if he ever solves the conundrum of how to get selected as a candidate for a Parliamentary constituency while McSweeney’s appointees still control the selection process) Andy Burnham. The insider smearing was intense, super-sexist, and so factionally extreme it even annoyed Phillipson, who was left to wear the albatross of ‘Starmer’s Choice’ around her neck in a race where everybody but the people running the Party knew the membership were out to send Dear Leader a message.

End result. Powell 54%, Phillipson 47%. Starmer fail.

Cheers, Morgan. You prick. How the hell do you still have a job?

So Many faces, Most Looking Right

We also had Party Conference month here in The Bottomlands, and you know what that means! Four weeks of vox-pops, earnest pieces to camera while not much was happening, and swooping camera shots as the BBC, that ever so carefully cultivated field of conservative-friendly legumes (by which I mean the editorial execs and the on-screen ‘talent’, not the dedicated employees who do the actual technical work of making TV very, very well indeed) tried to collect enough Entertainment News-style footage to edit and clip together a traditionalist narrative their Tory-appointed commissars could approve of. Right Wing = Sensible & Statesmanlike, Left Wing = Chaotic & Concerning, Centrist = Boring & Bumbling. Now with the added fizz of the Frog Fascist Far Right, which apparently = Interesting & Inevitable.

Nigel Farage’s ‘Reform Party’ isn’t an actual political Party, of course, though you’d never know that from the Media coverage. The ‘Reform Party’ is, just like its ‘Brexit Party’ predecessor, a limited company, with Froggy fuckface as one of its directors (since he had to officially divest himself of full ownership earlier this year) meaning it has entirely opaque funding and no real voice for the non-existent membership. If any vaguely left-leaning Party ever tried to operate in this way they’d be hounded day and night by armies of journalists and pundits affecting shock-horror outrage at the gross insult to basic democratic standards this all represents, but since it’s the Fascists, and since the UK Media is run by the people who work for the people who hold shares in the companies invested in by the people who fund Frogage and the whole Alliance of X-rated International Scumbags, it’s just another one of the things we’re sometimes tersely informed by plastic-wrapped pundits that “No one but Proggy scolds cares about”.

The Reform conference itself was exactly what you’d expect from a bunch of slab-foreheaded wallies too extreme for the modern Tory Party. Flag-shagging, an obsession with Securing The Border, unfunded bait-and-switch lures for the votes of ‘honest, working families’, sneering at, blaming and threatening various cartoonish Enemies, and just a general sense that the people on stage were having a fine old time out-doing each other with performative lunacy because everyone involved knew the whole thing was just a great, big theatrical stunt no one was supposed to take too seriously. How could they? If they ever did get into Government all of their policy decisions would be made far, far away from the hoi polloi, by much more important people. People with money. People, ultimately, called Vlad. In the end, all Reform PLC served up was a heaped plate of ‘Deport All Foreigns’, ‘Preserve Our Kultur’, and similar moronic three-word phrases, and their conference was little more than a cheap market stall for the kind of bitter and measly racism that used to be the preserve of the National Front and British National Party, back when the UK still had a veneer of civilisation about it, but since the disaster of 2016 has, sadly, but not accidentally, been rendered so mainstream it’s now tonally indistinguishable from a lot of what you’d hear at the Labour and Conservative conferences.

Next up were Ed Davey’s Homely Heroes of House Hufflepuff AKA The Liberal Democrats, the Party with 14 times as many seats as our Froggy Fascists, but 0.0001% of the coverage. Nobody really knows what their policies are (probably the political equivalent of a nice village hall Bake Sale in support of Edna Pickerslee’s asthmatic nephew Harvey’s charity walk from Mablethorpe to Hythe raising money to buy radio ads warning against logging in the Amazon – all nice and well-intentioned, but it’s not going to get the job done). The UK Media generally stopped paying attention to the Lib-Dems when Jo Swinson’s rabidly anti-Left version took a bullet to the tote bag in 2019, and the Liberal Democrat membership did themselves no favours by rejecting the ambitious Left-leaning option of Layla Moran as leader in favour of Davey’s Centrist Dad shtick, with its cringey stunt-appearances and tortuous defences of the Lib-Dem Party’s 2010-15 Coalition with Cameron’s Conservatives, the period that utterly shattered their hard earned reputation as being the Nice Party.  I’m tempted to put their rightward lurch down to the influx of ex-Tories that followed Brexit, but I don’t care enough to check. Davey is – trying – to wave the flag for last-century consensus politics, but that boat sank a long time ago, and all hands were lost.

Der Starmerpartei themselves imposed their Annual Conference on my own fair city of Liverpool. A mismatch of people and place similar in scale to that time August Escoffier tried to serve the full twelve courses of his revolutionary Flambee Soufflé to battalions of Les Poilus at the Battle of Verdun, only this time with fewer radicalised socialists and many more corporate hospitality booths. I was actually in the Town centre doing some charity work during that weekend (ooooh, virtue signal be bright today) and it was like someone had released a cattle car full of teenage Mad Men aficionados at Lime Street Station, hundreds of them zipping around the drizzled streets like bespectacled electrons in their Party-issued uniform of short mack and pointy shoes, oozing contempt from their clogged pores for the very concept of ‘Liverpool’ while dragging tiny wheeled suitcases through puddles and searching forlornly for the other 90% of their facial hair. Unfair, I know. A lot of them were probably terrifically decent people trying to keep the progressive Labour tradition alive while yoked to a leadership machine that hates their guts, but it’s just so hard to tell them from the ambitious Olivers and Tabithas who scored their delegate badges by spending their University holidays pruning the local constituency membership rolls of anyone with a social media account who had ever voiced less than 110% support for the leadership’s ever shifting policy offerings, so, yeah, I’m very okay with being mean.

The Not Tory sectors of the Media are used by now to the iron-fisted style of newnewelabourinc’s brief moments of uncomfortable intercourse with its despised membership, all stage managed to within a hair’s breadth of asphyxiation to avoid any more embarrassing outbreaks of democracy on the floor, with VIPs prowling around backstage cocking and uncocking their soundbite shotguns in front of rows of floor to ceiling mirrors and gimlet-eyed ‘advisors’ barking orders in between gulps of no-fat virgin blood. They waxed all kinds of lyrical over Starmer’s signature speech, with its very, very belated bleating about Nigel Farage being, maybe, a bit racist adjacent, but not actually racist, because of course that would be slander, while Reformites themselves were very fine people led astray by a fast-talking huckster and the raging intensity of their deeply held beliefs that were, despite all appearances, definitely not racist, honest. A step change in newnewlabourinc’s much reviled policy of symbolic synchronisation with Reform’s agenda, perhaps? Could our Sir Keir finally be learning the lesson of the last year and be making it easier for Polly Toynbee and Rafael Behr to justify their FTF Guardian sinecures? Could this be the start of Labour reunification and a return to popularity?

Fat chance.

Before the echoes of applause had faded it was the turn of the new Home Secretary, Shabana Mahmood, a woman who, as mentioned, made her bones heading up the election manipulation unit tasked by Hard Labour with making sure as many of the ‘the right candidates’ as possible made it through the reselection wars of 2020/21, to stand at her podium and give a headline speech chucking the same old divisive contempt at immigrants, proposing 2nd class status and Forced Labour as ways for them to ‘prove their dedication to becoming British’, and generally underlining two things for the hard of thinking. One, while Changed UK Labour™ would grimly endure a degree of applause from the soft-left for punching Farage in his soft gut, their number one priority was always to showcase their square-jawed rejection of namby-pamby, Pinko ‘progressive’ solutions by giving as many of Farage’s actual policies as possible a vigorous hand-job. And two, Shabana Mahmoud had clearly displaced Health Secretary Wes Streeting as the definitive new standard bearer for Blue Labour-brand authoritarianism, and was most definitely going to be a contender in the leadership race that was already revving up to replace Starmer.

They are so inept. So determinedly unpopular. So completely unwilling to admit a mistake that they’re reaching more and more often into the toolbag of authoritarian solutions they used to crush opposition in the Party and applying them to the country at large. Frankly, the only people to come out of that farce of a Conference with a modicum of self-respect were the people who risked being added to the Must Expel List to go see Andy Burnham making speeches about a more hopeful future for Labour at fringe events, and the Left-wing journalists Der Starmerpartei had booted from the Arena for doing basic journalist things. One of them was the FTF Guardian’s own token Leftie Owen Jones, who committed the cardinal sin of asking the parachuted in MP for North Durham and Greater Israel Luke Akehurst (imagine a mottled puce balloon on which a crying child has been asked to draw the angry face of their nastiest teacher) to explain the Government’s continued military support for Netanyahu’s war crimes and his own apologia for genocide. Spoilers, he couldn’t, but he did get ever so angry, and so Owen had to go.

The Green Party Conference, by contrast, went quite well. Lots of energy, lots of applause, and a lot of sensible things said about the need to address the UK’s gross inequality and spiralling corporate greed. In new leader beardless Zohran Mamdani Zack Polanski they’ve got a charismatic, media-friendly figurehead who is admirably nimble-footed in dealing with our turgid media’s predictably hostile reaction to anyone not blaming everything on THE MIGRATION CRISIS™!!! The speeches were good, zeroing in on the fact that what most people are actually angry about isn’t immigration, but inequality and the rising cost of living, and that the prime drivers of these problems are the same old same old – corporate welfare, right wing economics and Brexit. If it weren’t for the possibility of an actual Party of the actual Left maybe emerging out of Your Party, I’d already have joined up. As it stands, if there’s no YP candidate standing in my constituency in May (when the local elections are) then they’ve got my vote, and I am far from alone. Millions of centre-left voters sent flying out of the Labour tent by Der Starmerpartei’s Purity Police will likely end up finding a home in the Greens, eventually, I might even be among them.

Then the Tory Conference. Jesus Christ, how the mighty have fallen. In Kemi Badenoch they evidently hoped they were getting a plausibly Hard Right leader who could slow, and hopefully reverse, the exodus of voters to Reform by championing similar policies under the Conservative banner, an attack dog of crisply-vowelled extremism that could win back the Tory voters who had flocked to Reform’s No-Shame food-truck back in 2024 while sporting the right skin pigmentation to deflect accusations of being a gross racist. Unfortunately for them, what they actually got in Badenoch was a dim-witted gaffe machine who routinely launches ill thought-out tirades against the usual targets (Wokeness, Lefty Lawyers, Eurocrats, Immigrants, etc) that either fall flat because other people with larger megaphones have already said it, or go splat against the wall because they’re so telegraphed a banana plant could move fast enough to dodge them. It’s just embarrassing. I can’t even muster a good hate for her, she’s just so awful and she really doesn’t seem to know it.

At the dispatch box she’s a belligerent fantasist who lacks the basic political skills to inflict any kind of damage on Starmer. Say what you like about Flobalob Johnson (and I very much did) but he would have absolutely relished the weekly opportunity Prime Minister’s Questions allows for the Leader of the Opposition to tear Sir Plastic Poultryfucker into quivering strips of humiliated jerky. A toddler could do it, but Badenoch doesn’t have that level of professionalism. The only thing keeping her in place leading the Tories (into extinction) are the rules governing when Tory leadership challenges can take place and the fact that not many Tory MPs want her job. All that drama aside, the Tory Conference was basically just the Reform Conference, except everyone was older and had all the optimistic gaiety of a speed-dater who can see Larry Summers eyeing them from the next table. Hovering around 16% to 18% in the polls. Broken, ignored, absolutely devoid of relevance or new ideas. No one cares, let’s move on.

Your Party, the fledgling centre-left party for all the people Labour Together told to fuck off out of ‘their’ Changed UK Labour Party™ also held its inaugural Conference in Liverpool, at the same venue in fact, so a great deal of time had to be spent in advance steaming the place for lice and wiping Morgan ‘Twat’ McSweeney’s faecal scrawlings off the walls (I won’t repeat them here, but I will just say that he’s a very angry young man with a tenuous grasp on both human anatomy and appropriate inter-familial relationships). It was all considerably less chaotic than the UK Media were hoping for. No riots. No slanging matches. No orchestrated walk-outs. No deadlocked midnight votes over the minutiae of clause 931 of Subsection 442.a of Comradely Pronouns, Acceptable Pronunciations Of. Nothing for the pundits and their audience to point and laugh at. The member-delegates turned up on time and got on with the actual work of debating and voting on the Party’s foundational principles and rules while all the coverage was snoozeworthy gossip about how Jezza Corbyn and Zara Sultana, like, totally hate each other, right? And it’s all, like, hair-pulling and face-scratching. The UK Media either can’t or won’t wrap their ossified minds around the idea of a Member Led political Party that will choose its own policies and leaders and doesn’t ascribe to the personality-driven faux-presidential style of One Man – One Vote the Media prefer because it’s easier than doing their job. Will it work out? Only time will tell. But I can tell you it makes a refreshing change to have membership votes translate into progressive, inclusive Party policy, instead of membership votes being derisively ignored by a leadership cabal that thinks democracy is for losers.

Ah, British politics. Please change.

“Bring Me, My Special Spade”

Back in February last year messages from a WhatsApp group called Shiver Me Timbers used by some arsehole Manchester Labour politicians were leaked to the media and caused yet another scandal for Der Starmerpartei. Apart from being shitty about constituents and the Left in general (which is, like, totally promotion-worthy content as far as the McSweeney Tendency are concerned), the messages were racist, misogynistic and – the kicker – borderline antisemitic, and it led to the MP for Gorton and Denton, Andrew Gwynne, being dismissed as a Minister and suspended from the Party. Fast forward to January and Gwynne finally decided to step down as MP, which meant a by-election to replace him.

The thing that made this national news was that Gorton and Denton, being a constituency on the edge of Manchester, was seen as a natural fit for Manchester’s three-times elected Mayor Andy Burnham should he want to step down and stand as an MP. Actually, there’s no ‘should he’ about it. Burnham has made no secret about his desire to return to Parliament where a LOT of MPs see him as the standard-bearer for the kind of centre-left Labour policies the McSweeney Tendency has started rendering taboo following its successful purge of the Party’s socialists. With Starmer tottering and looking odds on to be defenestrated should newnewlabourinc get its balls kicked off in the local elections due to take place this May, there would almost certainly be an election for a new Labour leader and, therefore, a new PM. The behind-the-scenes manoeuvrings started early and have been going on for a long time. Inspired partly by Starmer’s historic unpopularity (he’s the most unpopular PM in history, basically) and partly by the open secret that Old Pained Expression was never actually meant to be PM at all. Back when the McSweeney Tendency picked him as their figurehead it was purely in order to give their extremist faction a candidate to hide behind who could plausibly (not very, IMHO) offer the Party’s exhausted membership an end to factional divisions and a popular Party manifesto of ‘Corbynism without Corbyn’. The McSweeneyites expected Flobalob to remain the darling of the UK Media for a good long while yet, and therefore stand a reasonable chance of being a two term PM, leaving them free to quietly purge ‘their’ Labour Party of anyone to their Left and replace them with safe, obedient, corporate-friendly drones acceptable to the real Interests that rule Britain, after which Starmer would fall on his sword after losing the 2024 Election and they could elevate a properly house-broken and semi-charismatic frontman to the leadership (they were probably thinking Wes Streeting) just in time for the electorate to get bored with Tory sleeze and hand them the keys to the kingdom in 2030.

Covid, Partygate, Truss, Rishi Sunak’s refusal to be a white male in an era of rising racism on the Right, all combined to shatter the Tory Party way ahead of schedule and catapult the Plastic Peer into a role he is temperamentally unsuited for and incapable of getting to grips with. A genuine leader with his own firmly rooted beliefs and ability to take people with him, when handed a 170+ majority and the current mayhem on the Right, could have been truly historic in the scope and scale of their ambitions – but we got Keir Starmer instead, an awkward, dismal, monotone of a man with the strong belief that he’d quite like a comfortable post-politics career chairing various important institutions and all the charisma of a damp urinal cake sandwiched between two off-grey woollen socks. A recent book had a quote from Dowing Street insiders (implied to be either McSweeney himself or one of his inner cohort) describing Starmer as, to paraphrase, the guy who thinks he’s driving the train because he’s allowed to sit in front with the autopilot on. Quite how no one was fired for that candid assessment is the answer to the question “Who really runs Changed UK Labour?” – Clue: Not Keir Starmer.

By lashing himself so firmly to McSweeney’s line of punching Left and lunging Right, all Starmer has managed to do is shatter the link between Changed UK Labour and its most loyal voter base and give Reform PLC free rein to monopolise public attention on the areas it wants to campaign on. By coming into office with a scandal about accepting freebie gifts from rich donors and then aiming newnewlabourinc’s lustful gaze directly at the swollen pockets of the American Right’s Tech-Bro Axis, he’s shielded Nigel Farage’s status as a bought and paid for acolyte of rich donors and allowed him to skate past the reality that he’s just the fictionalised creation of a Red, White & Blue themed billionazi social media promotion. Everything that the British electorate have come to despise about Changed UK Labour has Sir Keir Starmer’s inert moan of a face plastered all over it, he’s toxic. The unstated truth is that getting him gone from Number 10 at the earliest credible opportunity isn’t just vital for newnewlabourinc’s electoral chances (slim as they are) it’s vital for the careers of those who want to replace him there.

So, it looks like there are two main candidates already in position to campaign to be post-Starmer leader. Shabana Mahmood, snarling face of what I’d call the Blue Labour faction. Basically, the lobby group devised by Baron Maurice Glasman to promote what it calls “blue-collar and culturally conservative values within Labour, particularly on immigration, crime, equality and diversity and community spirit”. If that sounds highly Americanised, it is, because Glasman is an admirer of Trump and Bannon and wants Labour to become a populist movement promising (if I may translate their self-description into real terms) closed borders, heavy handed policing, the marginalisation of minorities and White Identity Politics. Blue Labour more or less think the Right has won the propaganda war and that the only way to preserve any kind of ‘progressive’ inheritance in the face of climate change and environmental collapse is for Labour to become the Party of Nationalist Socialism and shrink its outreach down to just the WWC and the most assimilated non-whites. Why would Mahmood, a woman of South-Asian heritage, be this faction’s chosen standard bearer? Because Blue Labour want to be able to claim that they’re not really surrendering to racism, they’re just providing a home for patriotic people of all colours and creeds who want to work hard and rescue the working classes from the scourges of out of control crime, mass immigration and Benefit Culture. That’s why Mahmood has moved fast to position herself as the (Fe)Mailed Fist of Order, announcing plans to end jury trials (temporarily, of course), create a National Police Service to combat terrorism (being redefined as we speak to mean saying words genocide apologists don’t like and targeting innocent likkle arms suppliers) organised crime (foreign born grooming gangs) and public disorder (dirty, UnBritish protestors and their dangerous placards) by bringing in a raft of exciting new high-tech tools produced by, ahem hem, Peter Thiel’s Palantir, and to force immigrants to accept second-class status and compulsory ‘public service’ requirements in order to stay in the country.

She’s chilling. The media love her.

Then there’s Wes Streeting, the Health Secretary and the man most affiliated with McSweeney’s own Labour Together faction. Though they’re often interlinked and have worked with and through each other for years, Labour Together lacks Blue Labour’s internationalist, populist themes. Labour Together is basically Peter Mandelson’s malformed seed injected into the Labour Party, now grown into an invasive parasite determined to march its horrified host in ever tighter rightward circles until nothing of the Left remains. The eventual aim is to replace the Conservative Party as the ‘natural Party of Government’ by shedding every progressive, leftwing policy and swapping its old electorate for the one that won elections for the Tories for most of the last century. Corporate-friendly, hostile to Unions and progressive causes, anxious to please and appease the right-wing Press and donor class at every turn. Streeting was their bright-eyed boy going back to 2016, when he was one of the faces of the anti-Corbyn smear machine that eventually cost the Party a couple of million votes and ushered in Flobalob. Under Starmer he was given the Health portfolio and embarked upon a mass reorganisation of the NHS (costing jobs but achieving very little) more tech-solutions waffle (new US-sourced IT systems produced by his donors that I can personally confirm are shit) and a bossy, confrontational, finger wagging attitude to junior doctors and nurses that has led to continued industrial action and the stalling of his ambitions. I’ll give Streeting this, though. He was one of the first to sniff the perilous rot in the foundations of Starmerism and, with Mahmood monopolising the Labour Right position, has been quietly, cynically, trying to redefine himself within the Party as Centre-Left Wes, the smiling solutions-orientated pragmatist with a good heart and a great media presence.

He’s transparently ambitious. The media love him.

Which is where Gorton and Denton comes back into the picture. Farage’s Reform PLC are polling at something like 30% there, with their MAGAesque message of Immigrants Are Stealing Your Jobs, Kids and Futures!!! offering low-info/high-misinfo residents something performative and ugly to vote for, as opposed to the “You get fuck all for your vote and you should like it, pleb” shriek emanating from the tin-eared brass-necks over at newnewlabourinc’s campaign strategy garderobe. With Burnham as the Labour candidate, that could well have shifted hard. He’s popular, has a record he can point to, and due to the Party leadership’s well-trumpeted hostility to him he could campaign there as a Labour candidate with something different and new and meaningful to offer. They might not garner the 50% of the vote they got last time (never mind the 60%+ Corbyn’s Labour got in the area) but surely enough to keep the seat and deny Reform PLC victory in another historically Labour constituency. All the Labour Party’s NEC (National Executive Committee) had to do was vote to allow him to resign as Mayor so that he could stand.

So, of course, by an 8 to 1 vote of the NEC’s executive officers, they blocked him. Mahmood herself, sitting as NEC chair, and after telling the media that she thought Burnham would be a great addition to the Parliamentary team (that was a clue as to the eventual decision right there) abstained, but Starmer himself voted to block while Deputy Leader Powell was the only officer to vote yes. Politically, how fucking stupid is that? For all newnewlabourinc’s feral spokescreeps piously repeating to the media that “Andy is doing too good a job as Mayor” and “Oooooh, it would cost so much to hold another mayoral election”, it was made absolutely crystal clear in every anonymous briefing and off camera quote that newnewlabourinc’s one and only concern was in preventing Burnham from becoming an MP because then he could stand in the next leadership election, and would probably win. They actually seem to think that they can sell this as some kind of display of strength, when all anyone can actually see is pants wetting weakness. The McSweeney Tendency is so wedded to stamping out all potential threats to its grip on the Leader’s Office that they just came out and fucked that chicken right in front of the whole country. Fucksake, Starmer himself voted to block, when any half competent advisor would have been screaming at him to either abstain or even vote with Powell (“Look how little I fear potential rivals and how totally democratic we are when the Leader is just one vote amongst many”) but being Starmer, he just did as he was told by Morgan and faceplanted straight into a puddle of fail.

So now Der Starmerpartei get the worst of all worlds. Their candidate for the seat is immediately tarred (however unfairly) as Starmer-approved and is already coming under heavy social media assault from pro-Reform PLC trolls, the Green Party (who have already stated they’d be willing to work with a Burnham-led Labour Party, but not Starmer’s iteration) are polling second to Reform PLC and looking like the safest bet for tactical voters who want to repeat the success of Welsh voters in the recent Caerphilly Sennedd election by voting for the most electable non-Reform PLC candidate. It’s funny, the constant bleat out of Changed UK Labour that everyone who doesn’t want a Reform PLC Government has to vote for them as the only viable alternative turns into a garotte around their thin necks when their own enthusiastically courted unpopularity renders them terminally unviable. They’ll almost certainly lose the seat, and either hand the Greens a ton of welcome credibility or give Farage a boost at a time when his political franchise is struggling to explain how its claims of being the anti-Establishment choice jibe with the sudden influx of failed Tory ex-Ministers to its ranks and he himself is struggling to shake off entirely believable accusations of being a lifelong racist antisemite.

All this, because when the choice was between listening to large swathes of their own Party who want one of the few well-known Labour politicians with a positive approval rating back on the national stage arguing the progressive case for a Labour Government or denying Labour MPs and members even the vaguest hint that they would be allowed to replace Sir Keir Starmer with anyone other than a Rightwing candidate he personally approves of, Morgan ‘Twat’ McSweeney could and would only ever choose option two.

Once more, with feeling.

Morgan. You prick. How the hell do you still have a job?

 

Anyway, I’m cutting it short here. That’s way over two hours worth of ranting and I haven’t even touched on newnewlabourinc’s self-owns on the Budget, Trump, and its shameless shilling for the interests of the genocidal extremists running a certain Middle-Eastern country. Enough is enough. Hope this note helps you feel a little better about events in your own country, where you might have Faecal Fascism and the Steve Miller Bund running the show, but at least your non-fascists seem willing to put up a decent fight.

 

Guest Post – Tony Jay: Note From Brexitania: Early Winter of 2026 EditionPost + Comments (152)

Part 7: The Coming AI Winter

by WaterGirl|  January 8, 20267:30 pm| 74 Comments

This post is in: Carlo Graziani, Carlo's Artificial Intelligence Series, Guest Posts, Science & Technology

This will be the final post in the series, so I want to extend a big thanks to Carlo for sharing both his knowledge and his thinking on AI with us!

Between the holiday break and some work deadlines and all the craziness, we’ve had some distractions that have made finding the right time for this final post a bit more complicated.  There has not been one complaint (not that I have seen, at least) about the delay, so I’m hoping that maybe this is better timing for all of us!

Thanks again, Carlo!

Guest post series from *Carlo Graziani.

Guest Post: AI 1

On Artificial Intelligence

Hello, Jackals, welcome back. Happy New Year, and thank you again for this opportunity. Being able to write these posts on AI has been very helpful to me, because writing this stuff out in a manner suitable for exposition has really forced me to clarify my own views on AI, and has allowed me to be much more exact and specific on both what my objections to the AI enterprise are, and on what the value of that enterprise is. Basically I’m in a better place to tell the baby from the bathwater, thanks to this series.

This is the last post in the series. I have more-or-less emptied the bag of technical matters concerning AI which I think I understand that most people don’t, and having done that, it’s time for me to summarize the content of the past six posts, and to use that summary to try to understand where we are on AI today, and where we are likely going, at least in the near future.

Again, you have my gratitude for reading, and for the very high level of the comments that have followed each of the previous posts. BJ really is a unique, special place.

Part 7: The Coming AI Winter

Let’s start out with a very high-level summary of where we’ve been in this series.

AI and Learning

At several points in this series, I pointed out that all “AI” is in fact a form of statistical learning, and introduced the Statistical Learning Catechism, which states that what every such system does is the following:

  1. Take a set of data, and infer an approximation to the statistical distribution from which the data was sampled;
    • Data could be images, weather states, protein structures, text…
  2. At the same time, optimize some decision-choosing rule in a space of such decisions, exploiting the learned distribution;
    • A decision could be the forecast of a temperature, or a label assignment to a picture, or the next move of a robot in a biochemistry lab, or a policy, or a response to a text prompt…
  3. Now when presented with a new set of data samples, produce the decisions appropriate to each one, using facts about the data distribution and about the decision optimality criteria inferred in parts 1. and 2.

It is worth emphasizing, again, that this is all that is going on in any AI system, including all Large Language Models (LLMs). It’s basically just pattern recognition coupled to decision-making: if you can model how you would have made reasonable decisions based on past data, you can use that model to make reasonable new decisions based on new data.

The fundamental simplicity of this scheme belies the power of the methods that it enables when coupled to modern computing. Academic computer scientists were the first to realize the possibilities inherent in that coupling. In 2007, they began to show that many learning problems previously regarded as intractable could be easily solved using a set of computational techniques based on neural network models that came to be known as “Deep Learning”. Examples of such problems are image classification, voice recognition, protein structure prediction, materials properties prediction, empirical weather forecasting, and, beginning in 2017, natural language processing. The last one, brought about by the advent of a type of DL model called a “transformer”, marks the arrival of LLMs, and the beginning of what most people nowadays think of as “AI”.

The Role of the Tech Industry

The choice of the term “Artificial Intelligence” to describe this subject is misleading, however. AI is nothing but statistical learning, and learning is a very limited aspect of human cognition, not being remotely sufficient to model “intelligence”. After all, even single-celled animals “learn”. The impressive parlor tricks that can be performed by LLM-based chatbots should not deceive us into anthropomorphic intepretations of their workings. They are by no stretch of the imagination “Intelligent”.

show full post on front page

Unfortunately, the deceptiveness of the term “AI” is to a large extent deliberate. The fact of the matter is that this subject has escaped the control of academic researchers, having been for all intents and purposes hijacked by the Tech industry, which soon saw business opportunities in chatbot parlor tricks. The industry’s problem was that by 2022, revenues from tech platforms (FaceBook, Twitter, Google ads, etc.) were no longer growing at the spectacular rates that their leaders felt was the special privilege of their futuristic enterprises. They were in danger of becoming normal companies with normal growth expectations 1. This prospect outraged the peculiar techno-messianic sensibilities that seem endemic to many in the industry’s leadership class. They settled on AI as the basis for the next phase of their industry’s evolution. In so doing, they began to impress a corrupting influence on machine learning research that is analogizable to the corruption of climate science perpetrated by the Oil industry, or to the earlier corruption of cancer research by the Tobacco industry.

The AI future that Tech leaders imagined is a strange admixture of technical naiveté, utopian ambition, and naked greed. On the evidence of current LLM capabilities for natural language processing (NLP), they persuaded themselves that they could bring about a new type of AI called “Artificial General Intelligence” that would constitute a transformative, disruptive technology, analogous to the steam engine or the telegraph. This they could accomplish by means of very large investments (totalling $3-5T by 2030) in data center construction, in acquisition of computing hardware (GPUs), in securing large power contracts required to run that hardware, and in extensive hires of data science and computer engineering personnel.

This AGI technology would become an engine of disruption, changing every industry or business that it touched. In the Utopic version of this story, it becomes a driver of a sort of “post-scarcity” world in which human resource competition and conflict become things of the past. In the more self-interest-grounded C-Suite discussions of the profitability of AI, a darker story is told, in which business leaders of other industries are offered AI tools that enable them to reduce their labor costs (i.e. to replace much of their work forces with AI tools). This process would re-direct large fractions of the resulting labor cost savings back to suppliers of AI services, thus assuring those firms of the abnormal profitability growth which they regard as their birthright.

What Is Wrong With This Picture?

Set aside the pious self-congratulatory naivetè of the Utopian story, and the obvious self-interested amorality of the business plan. From a practical point of view, what we should ask ourselves is this: Is there any reason to believe that this vision of the future is at all possible?

That’s another rhetorical question. The answer is that for good and sufficient reasons, this plan is going to fail, badly, and with negative consequences for Tech firms, their customers, the global economy, and the U.S. and other governments. Let us review reasons, and consequences:

(1) There Is No AGI Down This Technological Path

For the central driver of a story designed to mobilize trillions of dollars in investments, the term “AGI” is defined with maddening vagueness. But it is generally agreed that a fundamental aspect of AGI is computational reason. And in fact, researchers working in the AI industry lard their technical terminology with allusions to reason, including “reasoning models”, “reasoning tokens”, “chain-of-thought systems” and other such constructions. By such means they have persuaded themselves that they are on the path to solve (or perhaps have already solved) one of the thorniest problems in science: the modeling of human reason.

They have, of course, done nothing of the sort. In Parts 3-4 of this series, I discussed the wrong-headedness of conflating learning with reasoning. I gave a fairly detailed discussion of the type of scientific approach that one might try to connect modern machine learning to reasoning. This set of reflections has the benefit of illustrating the fact that reasoning qualitatively transcends learning in essential ways: recall that reasoning features the discontinuity of “Aha!”, which learning, as a continuous process, cannot emulate. The notion that one can simply train a machine learning system until it “learns to reason” (sometimes referred to as “emergence”) is scientific nonsense of the type characterized by Wolfgang Pauli as “not even wrong”.

Since all current AI is comprised of learning systems, none of it can learn to reason, even in principle. While AGI might be possible in principle, and we may see some version someday, it will certainly not be on a foundation of current DL-based technologies. Which means that the target motivating the colossal current investments in AI doesn’t even exist. That’s Bad News Message Number 1 for the AI enterprise.

(2) AI Hallucinations Are Never Going Away

By now most people are familiar with, or at least have heard about “AI Hallucinations”. AI famously gives dangerously bad financial and medical advice, incorrectly instructs firms’ customers about company policy, writes legal briefs that cites non-existent case law, screws up mathematical reasoning with unflappable didactic aplomb, writes wrong and unusable computer code, and otherwise reliably produces nonsense at a sustained rate whenever asked to produce output. There exists a subject now called “prompt engineering” in which domain experts use their expertise to detect such incorrect output, and tune or steer their AI prompts in order to coax chatbots into producing improved output. Only by such means is it possible to get actual useful output from AI systems.

The Tech industry’s “AGI” narrative holds, among other things, that such hallucinations are minor failings, which will in any event be addressed through more computing power, more training data, and larger models. In effect they believe that through hyperscaling, they can train models to stop hallucinating.

There is not a shred of quality evidence to suggest that this is true, and considerable reason to believe that hallucinations are an ineradicable part of the output of modern LLMs. They arise, in my view, from a very brittle and imperfect model of the probabilistic distribution over language sequences (sentences, paragraphs, and so on) that the models learn in training. The imperfection of the models is in fact structural: it is brought about by the token embedding process shared by all NLP methods. This process endows the “space” of language sequences with an improper notion of proximity that assimilates nonsense sentences to sensible ones, to a degree not justified by the actual sentence distribution sampled by the training data.

Note that this is a different problem from the impossibility of “learning to reason”. What one would hope of a DL-based LLM is that even though it doesn’t reason, it can generate sentences that are at least consistent with the sorts of sentences that it encountered in training. But that is not the nature of many AI hallucinations: they actually contravene the training distribution, because their broken internal model of that distribution places nonsense responses “closer” to sensible ones than they actually are, for technical reasons having to do with the geometry of the embedding process.

What this means is that AI hallucinations are in all likelihood structural to LLMs. They cannot be eliminated by training with more data, because more data would require more model capacity, which in turn would create more opportunities for embedding to find ways to situate nonsense geometrically near sense. And therefore, any “AGI” that is marketed by the industry is certain to be mentally ill at birth. This is Bad News Message Number 2.

(3) Hyperscaling, “Emergence”, and Model Inefficiency

One of the funniest intellectual failures driven by the corruption of machine learning by the Tech industry is practitioners’ faith in the phenomenon of “emergence” of general intelligence, which, they believe, is achieved by scaling model capacity (measured in billions of adjustable parameters) to the point that a model “suddenly” starts to produce reasonable responses to prompts. There is even the notion that some kind of fundamental law of computing has been discovered, relating the size of a training corpus of text to the required model capacity required to achieve this “emergence” of intelligence. One scales linearly with the other: double the size of the training data and you must double the model capacity to achieve “emergence”. This is in fact the “scaling law” at the root of the industry’s mad drive to build out datacenters, buy GPU hardware, secure power contracts sufficient to power medium-sized cities, and hire data science talent, which we now call hyperscaling.

The problem is that “emergence” is bullshit. The scaling law between training data size and model capacity is a fact of deep learning that has been true of all DL systems since the earliest examples of their application. It is as true of an image classifier recursive neural net (RNN) as it is of a transformer LLM. There is no qualitative difference between the two cases.

The qualitative difference that does apply pertains to the data being modeled: the complexity of the distribution of human language sentences is vastly greater than that of the distribution of natural images, or of weather states, or of materials properties. That is, the modeling of human natural language is an enormously more ambitious subject to tackle than any other problem that has been addressed using DL methods. For this reason, vastly more data is required to get a grip on the distribution than has been the case for any other DL modeling problem. An image classifier needs about 30,000 labeled images to achieve state-of-the-art performance. An NLP transformer needs tens of billions of words to achieve even moderately acceptable performance. And because the data requirements are higher, so are the model capacity requirements. You have to hyperscale the model size in order to get to the point where an NLP transformer doesn’t suck anymore. Hilariously, this “point of sucking less” is what practitioners of the subject refer to as “emergence” of intelligence (and even, in their more reckless moments, of AGI). It is nothing of the sort, unless you think that scaling an image classifier’s model capacity to the point that you can train it to classify images is also “emergence”.

While this model capacity scaling is a reasonable tradeoff for essentially all other DL applications, it is madly, unaffordably expensive when applied to language modeling. Unfortunately there are not many research efforts underway to attempt to find alternative modeling methods that break the scaling, because the AI train as currently constituted has an irresistible momentum conferred upon it by the mad level of investment that has been summoned up by the industry. The level of groupthink on this, even among academic scientists, is astounding, as well as depressing.

So, this is Bad News Message Number 3: unaffordable hyperscaling is baked into the methodology underlying AI chatbottery, and there is negligible research effort dedicated to circumventing it.

On “AI Winters”

Here’s the summary of the summary: the Tech industry’s target, AGI, is not achievable even in principle using current technology. And a subsidiary goal, the taming of the stubborn problem of AI hallucinations, is also not on the table. This being the case, the insane build-out of AI infrastructure is to no purpose, or at least to no purpose articulated by the AI industry. There is no pot of gold marked “AGI” at the end of the hyperscaling rainbow, or even a small purse marked “no hallucinations”. But it would take an infinite amount of data, compute, electrical power, effort, and capital to get there and find out. This cannot possibly end well.

We should try to ask ourselves the “How Does This End” question now, so as to be prepared to recognize the end when it happens. As it happens the subject of AI has an instructive history antedating the post-2007 developments that gave rise to modern “AI”. That history points to a certain cyclic pattern of research and development, culminating in “cataclysmic” (to its academic practitioners) collapses that are known as “AI Winters”.

As usual with history, it is difficult to divide up events into neat periodizations. Nonetheless, from the widest possible perspective, there appear to have been two prominent cycles of development and disappointment, culminating in AI Winters. The first Winter is generally held to have set in around 1973-1974, when research funding agencies in the U.S. and the U.K. concluded that the promise of research in machine translation, speech understanding, and “perceptrons” (single-layer neural networks) had been overhyped and was unlikely to deliver anything of value. Millions of dollars of funding was cancelled, leading to a massive contraction of the field, and to the end of many young (untenured) scientific careers.

By the 1980s, interest in AI had revived, carried largely by the advent of “Expert Systems”, which embodied hand-encoded knowledge according to some new knowledge representation schemes. New enthusiasm among researchers also led to a fair amount of media hype concerning the prospects for the new AI. Government funding followed suit, with the Japanese government’s 1981 announcement of its “Fifth Generation Computing” project. The U.S. government responded with DARPA’s Strategic Computing Initiative (SCI) in 1983.

But as early as 1984, in an article coining the term “AI Winter”, researchers Roger Schank and Marvin Minsky had started sounding the alarm about the growing AI hype, recognizing an echo of circumstances that had resulted in the disappointment and funding cutbacks of 1973-1974. As summarized in the Wikipedia article, Schank and Minski “… described a chain reaction, similar to a ”nuclear winter”, that would begin with pessimism in the AI community, followed by pessimism in the press, followed by a severe cutback in funding, followed by the end of serious research. Three years later the billion-dollar AI industry began to collapse.” By 1987, DARPA had wound up the SCI, and Japan did the same with the Fifth Generation project in 1991. Both were judged to be wasteful disappointments. By the 1990s, essentially all development of expert systems ceased, and the term “AI” itself began to seem somewhat disreputable among scientists and science funders. AI Winter II had set in.

Winters, and Bubbles

There are some useful lessons for our current moment in these events. In particular, it seems that Schank and Minski were prescient in their warning, and perceived a clear pattern to AI research and funding leading to the AI Winter Cycle. The phases of that cycle are:

1. Technical progress

2. Excitement

3. Media hype

4. Investment based on hype

5. Disappointment of investor hopes

6. Withdrawal of investment

7. Winter

When I first began thinking about writing a series of essays on AI (mid-summer 2025), it appeared to me that the hype around modern AI was vastly over-wrought, but it was successfully bringing in unprecedented levels of investment ($300B-$400B annualized investment in 2025, anticipated to rise going forward). That is, if, as I suspected, this was an AI Winter cycle, we we at about Phase (4). There were some differences in the historical pattern, the principal one being that the funders in this cycle are private investors to a far greater extent than government, but I thought that there were enough correspondences to suggest the same cresting wave pattern, culminating in an AI Winter. I couldn’t be sure of when the climax (Phase 7) might arrive, however. And very few of my colleagues were willing to entertain the idea that another Winter was coming.

Under current circumstances, with private investors taking the role previously occupied by Government funding agencies, the concept of an AI Winter necessarily becomes linked to that of an AI Bubble. This was also not a common position to take in Summer 2025. At that stage, the people suggesting that there might be a financial bubble connected to AI in the offing were definitely a minority2, and it was certainly not a respectable position to take.

As of this writing (January 2026), we appear to have fast-forwarded to Phase 5, and are beginning to see signs of Phase 6. The phrase “AI Bubble” is no longer in bad odor, but rather has featured prominently in mainstream press reporting, including in The New York Times, The FT, the WSJ, Bloomberg, Barron’s and many other sources of both general and financial news. Oracle’s stock dove into the toilet on investor fears of the unsustainability of its datacenter buildout plans. NVidia and Meta’s datacenter financing deals are being compared to the practices that led to the 2001 implosion of Enron.

And, suddenly, people are noticing that no AI company is profitable, and that nobody knows a path to AI profitability. Ed Zitron estimates that OpenAI’s real revenue is a fraction of its inference costs, and therefore cannot even begin to amortize the GPT model family’s enormous training costs. A similar story applies to Anthropic. In a thus far fruitless search for profitability, both of those companies have taken to poaching the business of the downstream vendors who repackage and resell their services, a sure sign of an business ecosystem heading for collapse.

Suddenly, it seems as if we are already past the threshold of Phase 6-7 of the AI Winter Cycle, and this time it comes with an extra helping of Financial Shipwreck. Seven AI companies now make up 30% of the value of the S&P 500, and represent essentially all the year’s growth in the index. U.S. GDP and annual GDP growth from AI are both now in the 1%-1.5% range, which is a crazy level of risk exposure to an industry with a weak grasp on its own business.

If, as now seems very likely, investors start demanding that AI start paying its own way now, instead of waiting for their scheduled 2030 arrival at the Sunny AGI Uplands, the AI world is suddenly going to look like a very different place, because AI can’t pay its own way at current revenue levels. Free ChatGPT/Claude/Gemini is going away this year, I feel pretty sure. I don’t think that even $200/seat/month accounts bring in enough money to pay for the inference costs that they inflict on their suppliers, so at a minimum all those subscription rates need to be raised sharply just to cover their costs, and that is the way to shrink the hell out of that market.

The U.S. government has been making large AI infrastructure and model development investments predicated on partnering with the AI industry, which would still be largely in charge of pretraining models to be fine-tuned by government scientific (and other) customers. That model is not going to work at all if those AI companies start selling off their data centers and power contracts to Crypto miners for pennies on the dollar, and start firing employees as if suffering from a case of corporate dysentery. There would be nobody on the partner side answering email.

Also, the U.S. government cannot bail out the AI industry. There will no doubt be calls for a new Federal program to rescue NVidia, Microsoft, Meta, OpenAI, Anthropic etc., adverting to the “strategic” value that those firms have to U.S. national security. But even if this were not the wretched self-serving horseshit that it clearly is, the U.S. can’t afford to make up the losses of trillions of dollars in mis-allocated capital that these companies have already set on fire. The U.S. government’s financial position is much more precarious than it was in 2009, and in any event nobody believes that the Feds would know what to do with NVidia, or Anthropic, or Oracle, if they somehow wound up with majority shares of those firms.

I think that it’s coming apart now. Not in a few months. It is happening under our eyes, now.

Then What?

The last 8 paragraphs are, I believe, the most derivative and least informed of any that I have written in this series. I am not a financial expert in any sense of the phrase. Most of what I have written in these essays I can defend based on my fairly extensive domain knowledge in the subjects of machine learning and statistics and computational science. I do wish to write about what I think is going to happen, however, and in this moment it appears that to do so I need to connect the things that I do understand well to financial matters that I feel much less certain about. Please do not make any investment decisions based on what I write.

Instead of babbling on about business economics, I’d like to return to the subject of machine learning (and be shot of the term “AI”), and try to understand what residual value will remain from this strange, two-decade adventure, once the now-inevitable Winter brings on its now-inevitable retrenchment.

If you’ve been patient enough to read through these posts, then you probably know that I am an admirer of many of the scientific accomplishments that came out of the 2007 deep learning revolution. Those accomplishments are real. We can now distill weather patterns output by computationally-expensive numerical weather prediction (NWP) codes into computationally-cheap DL models that can reliably forecast weather up to 2 weeks in the future as well as those NWP codes can. We can predict protein structure, which, take my word for it, is a huge advance in biology. We can make very good guesses at chemical and material structures that lead to desirable properties, and that can actually be manufactured, without those chemicals or materials ever having existed before. These are all Nobel-caliber advances, some of which have in fact been rewarded with Nobel prizes. We will still have them, and obtain others like them, after the dust settles on the burst AI bubble.

The AI companies themselves will, incredibly, be forced to bequest to the public some things of great value. Most of the models at the cores of their chatbots, including their trained model parameters, are publicly available and open-source licensed at Huggingface, a public code and data repository dedicated to LLMs. The firms that built those models do not release their training data, but they did train those models on enormous datasets, at enormous expense in compute, electrical power, and capital, and the resulting models can actually be useful when run stand-alone (if one has reasonable expectations). I think that Marc has made this sort of point several times in comments to previous posts, and I basically agree with it.

Moreover, as I wrote in Part 6, I also believe that the LLM adventure has taught us something interesting that we did not previously know: it is in fact possible to model human language on a computer. This is one of the hardest modeling problems ever attempted, and the history of NLP is littered with failed attempts to crack it. The 2017 introduction of the transformer architecture succeeded in demonstrating that the crazy idiomatic quirkiness of human expression is not beyond capture by computers that we can build, now. That’s a fantastic discovery, because previous experience suggested that capturing the distribution of human language on a computer might be one of those things that are possible in principle, but that one could bankrupt the world and still fail if one actually attempted to do it in practice. Now we know that it can be done by nearly bankrupting the world. This is progress! What we need now is to figure out alternative NLP strategies that do not suffer from the crazy scalings of DL methods. I think that there are some real possibilities for this.

For me, an important silver lining in the LLM debacle is that academic research on machine learning may finally recover from the learned helplessness with which it has faced its exclusion from the development of state-of-the-art language models. The eye-watering cost of the infrastucture required to train and operate such models has meant that academic scientist have been sidelined from model development and pretraining. Also (I’m a bit ashamed to say) a certain passivity and vulnerability to groupthink has prevented them from looking for good alternatives to simply becoming customers of the AI industry. Now that one can forsee that the colossal investments by the industry will soon cease to exercise their mesmerizing effects on researchers, perhaps we can get serious about the science of machine learning, and about the uses of machine learning in science. And about recovering our agency in our own scientific endeavor.


  1. Ed Zitron’s substack and “Better Offline” podcast offers far more detailed and research-backed analyses of the business economics of the Tech industry than what I can possibly write. I find them quite valuable, if a bit prolix at times.↩︎
  2. A financial bubble is a different “cyclic” pattern from an AI Winter, of course. One is a concept in history of finance, the other in history of science. They share the essential trait of hype and unrealistic investor hopes. For the first time in history, however, one type of cycle may trigger the other. Again, Zitron has been one of the most prominent and early voices warning of an AI bubble in the offing.↩︎

Part 7: The Coming AI WinterPost + Comments (74)

Part 6: The Pathology at the Heart of Hyperscaling

by WaterGirl|  December 10, 20257:30 pm| 40 Comments

This post is in: Carlo Graziani, Carlo's Artificial Intelligence Series, Guest Posts, Science & Technology

Guest post series from *Carlo Graziani.

Guest Post: AI 1

On Artificial Intelligence

Hello, Jackals. Welcome back, and thank you again for this opportunity. Being able to write these posts on AI has been very helpful to me in clarifying and sorting out my thinking on this subject. The comments that have followed each post have been of very high quality and on point, making up excellent and informative (including to me) discussions.

The plan is to release one of these per week, on Wednesdays, with the Artificial Intelligence tag on all the posts, to assist people in staying with the plot.

Most of these posts have had a nerdy tinge, because the take that I have developed on AI is an unusual one, blending a mix of reflections on the technical side of the subject with a skeptical (and largely contrarian) outlook on much that passes for conventional wisdom among this discipline’s practitioners. This sort of project, wherein someone claims that much accepted technical wisdom of a certain field of science and technology is in fact wrong, necessarily exposes one to charges of being a crank or a crackpot, unless one is careful to provide some detailed and hopefully persuasive technical arguments pointing to unexamined assumptions and to scientifically plausible alternative views. Hence the plunge into nerd-core.

This post is the last of the truly nerdy posts in this series. After this, I’ve mostly emptied the bag of things that I think I know that most people don’t. So the final post will be a sort of high-level summary, combining take-aways from the series with some historical considerations to attempt some synthesis of where we are with AI, what this moment means, and where we might be heading in the not-too-distant future. If you’ve been waiting for a hopefully more accessible discussion of AI, that should be the one.

That said…

Part 6: The Pathology at the Heart of Hyperscaling

The term “hyperscaling” entered the common vernacular with a vengeance in 2025. The prefix “hyper” in “hyperscaling” is not marketing vapor. At this stage, most people who notice economic news at all are aware that something very unusual is happening to the U.S. economy. AI Capital Expenditures (“CapEx”) contributed 1.1% to U.S. GDP growth in the first half of 2025, a figure that is almost certainly higher when annualized. Total AI CapEx in 2025 is estimated at somewhere in the neighborhood of $500B, which is an incredible 1.4% of U.S. GDP. The Tech industry has persuaded analysts that AI investments between now and 2030 will add up to something like $5T. And as astounding as these numbers are, they look tame compared to the anticipated build-out of U.S. electrical power generation: estimates of the additional generation (and transmission) capacity required to run AI training and inference are somewhere in the range of 75-100 GW. The middle of that range (87.5 GW) corresponds roughly to roughly 7% of total U.S. electrical generation capacity. And that’s just to power the AI data centers.

show full post on front page

These numbers are ridiculous. It seems clear that those projections for cost and power generation are totally unrealistic, unlikely to ever come to pass, and damaging to the economy to the limited extent that they do come to pass.

The damage is not limited to economics, however. This level of investment also moves minds. The Tech industry consensus—that hyperscaling of large language models (LLMs) will bring about Artificial General Intelligence (AGI)—has attracted a critical mass of mindshare from government and from academia. This is certainly a bit of an own-goal for academic institutions, because by buying in to this consensus academic scientists have entirely shut themselves out of LLM development: no university or consortium of universities can afford to build out even a small fraction of the computational infrastructure required to train or operate these models at these scales. And even the U.S. government is struggling to stay in the game.

The interesting question here is why has this consensus developed across the field. And the “why” that I’d like to discuss in not a business strategy “why”, but rather a technical “why”. As a matter of science and technology, what is the peculiarity of LLMs that fuels the drive to hyperscale? After all, we have had “AI” deep learning (DL) methods since 2007. Hyperscaling, however, is peculiar to the subset of DL methods that power LLMs, and has only really gotten underway since 2022. What changed?

Overparametrization

In a sense, the seeds of hyperscaling have been an implicit part of DL ever since the DL revolution began in 2007. The origin of the phenomenon is the practice of model overparametrization.

In normal parametrization, one has a model with some parameters, which may be thought of as knobs that one can dial to arbitrary values. The knobs control the predictions that the model makes of the data. One determines the values of those knobs by adjusting them so as to minimize the misfit between the model and the data. Normally, one has many fewer parameters than data samples.

Overparametrization is the practice of endowing a model with many more parameters than one has data samples to model. It has not always been regarded as a useful practice: before the rise of DL, it was in very bad odor, for good and sufficient reasons. The problem with overparametrization is that it endows a model with too much flexibility and too little predictive power. After all, the purpose of training a model on some data is to obtain a means of making predictions and decisions given new data samples (recall the Statistical Learning Catechism, from Part 1 of this series). But if I have 10,000 data samples, and I naively train a 100,000-parameter model on that data, what will inevitably happen is that the model will predict the training data perfectly, while giving wildly wrong predictions about new data not included in the training set. In statistical parlance, the model will overfit the data.

Basically, as a matter of algebra, you only need 10,000 parameters to “solve” for the 10,000 training data samples. This is sometimes called “memorizing the training data”, i.e. learning to predict those 10,000 training samples perfectly. Despite its perfect fit to the training data, that 10,000-parameter model will give terrible predictions of any new data not included in the training set. And a 100,000-parameter model is—or ought to be—much worse.

The point is that the true process underlying the data always has more smoothness than the data itself, because the data has additional random noise. Any computational model of that process should not be allowed to play connect-the-dots with the data, because that would be tantamount to chasing the random noise. But that is exactly what a naive overparametrized model does. And because it learns to chase random noise in the training data, it cannot properly predict the smooth behavior of the underlying process. We say of such models that they have poor generalization properties, which just means that no interpolation based on such a model can be trusted 1.

The problem is that in the example of the 100,000-parameter model trained on 10,000 data samples (not an atypical case for a convolutional neural net trained on a corpus of images) there is a roughly 90,000-dimensional parameter subspace of solutions that exactly memorize the training data. That subspace has a thickened neighborhood (in the full 100,000-dimensional space) of solutions that don’t quite memorize the data. In that neighborhood exist some values of the parameters that endow the model with reasonable generalization properties. Those values are the target.

Regularization and Stochastic Gradient Descent

The way to improve the generalization properties of an overparametrized model is to regularize it. Regularization is a very general term for a time-honored, broad family of techniques that deprive an overparametrized model of much of its freedom. By regularizing such a model, we in effect impose some kind of smoothness properties on it that interferes with its ability to play connect-the-dots with the data, thus making it a more realistic representation of the smooth underlying data-generating process. The regularization technique adopted in DL methods is peculiar to the discipline, because it was in DL that extreme overparametrization was first seriously considered.

DL methods find the target parameters through the technique of stochastic gradient descent (SGD). The “gradient descent” means that there is some defined cost function of the parameters (such as some average of the prediction errors over the training set, for example), and that cost function is minimized (“descent”) by following it through the parameter space along the steepest descent direction.

Here’s an analogy to help understand the training process: think of the cost function as a continental landscape. You know that the landscape has a long, broad valley somewhere in the middle of the continent, and you are targeting a destination somewhere near the valley floor. Not the exact floor, because that would correspond to the noise-chasing data-memorization solutions, but somewhere in that neighborhood. To find the right neighborhood, you need to locate the valley floor.

A reasonable first approach is to always follow the local descending direction, until you find the lowest point. But that won’t work. The problem is that that while the valley walls slope down on average (by about a foot per mile, say), the landscape is highly textured, with lots of local structure such as boulders, hills, small and twisty valleys, mountain passes, and so on. The local direction of steepest descent might actually lead you away from the continental valley floor, because you are descending some random twisty valley, or because the direction that you need to follow leads up some mountain pass. The landscape texture in the analogy is the structure added to the cost function by those 10,000 data samples, each one of which acts as a sort of structure-adding knob, contributing the boulders, hills, etc.

You need some strategy to ignore the local texture and find the large-scale average descent direction. If only by squinting you could blur the landscape, only resolving structure to within a mile instead of to a few feet, you might be able to see the average descent towards the valley floor, because the boulders etc. would be fuzzed away.

That is where the “stochastic” part of SGD comes in. By randomly swapping subsamples (“minibatches”) of data at every step of gradient descent, the optimization code’s view of the landscape texture is blurred, because the landscape of fine-scale features changes with every minibatch. As a consequence, the small-scale structure ceases to obscure the large-scale path of descent. Even better, the 90,000-dimensional memorization submanifold—the exact bottom of the valley, corresponding to the family of connect-the-dots solutions—also gets fuzzed out somewhat, and harder to find. The optimization algorithm dwells for longer in the liminal region where good generalization may be found.

Such a solution is identified by computing the cost function on held-out “test” data, which is not used in the gradient computation. At first, both the training and test costs decline in tandem. At some point in the training, however, the test cost stops dropping and either flattens out or starts rising again, while the training cost continues to drop. This is the stopping point of the training loop: a parameter solution has been located with reasonable generalization properties, because the test cost is OK, and there is no point in continuing, because the optimization routine is about to find the data-memorization submanifold (the lowest point of the valley floor).

Overparametrization and Data Types

That was rather a long explanation of the practice of overparametrization in DL. I need it here, because I need to discuss the costs and benefits of the practice, and how those vary depending on the type of data and decisions that one’s model is required to traffic in.

The combination of overparametrization and SGD is key to the success of all DL methods. It should be clear, however, that from a strictly statistical point of view, overparametrization is inefficient. Remember that according to the Statistical Learning Catechism, an important part of the job of any such model is to learn the distribution of the data. That distribution may be quite complex, and many parameters may be required to approximate it. But in a principled statistical model, the required number of parameters should not depend on the number of samples used to determine those parameters. Instead, the number of parameters should be fixed, and determined only by the structure of the data-generating process. As the number of data samples increases, the precision with which that fixed parameter set is determined should get better and better 2.

DL models do not have this property. Instead, there is a notion of model capacity, which corresponds to the size of the parametrization, and which is required to grow as the training data grows, in order to maintain the necessary overparametrization. The more data one trains on, the greater the required model capacity. This is the sense in which they are inefficient.

This inefficiency is not, in and of itself, a bad thing. As we have seen, DL methods have been incredibly successful at addressing problems such as image classification, protein folding, materials design, etc. that were previously regarded as intractable. When used in such applications, they could be trained at very reasonable cost in computation and energy. It was definitely not necessary to occupy a 10,000-GPU datacenter for months to train a high-quality image classifier, for example. On the other hand, this is exactly the computational scale required to train an LLM on corpuses of text. Why the difference?

The answer, in my opinion, is that the distribution of text tokens from human language is vastly more complex than any other data distribution that has ever been modeled using DL methods. Billions of tokens of text are required in order to begin to capture the regularities and peculiarities and sheer chaos of human expression. This is larger, by several orders of magnitude, than, for example, the number of photographic images required to train an image classifier. And this being the case, the model capacity of LLMs has of necessity grown to unprecedented size (hundreds of billions, or even trillions of parameters), in order to maintain overparametrization.

But training models of this sort of capacity requires vastly more compute and energy. Hence, hyperscaling.

On “Emergence”

The account that LLM practitioners give of their capacity issues is simultaneously humorous and frustrating. They do not appear to view those issues in the light of the story that I have just presented. They see no continuity with the capacity requirements characteristic of all DL. Instead, they seem to have persuaded themselves that the relationship between data corpus size (in billions of tokens) and LLM capacity (in billions of parameters) is sui generis, and amounts to some kind of natural law of computing wherein intelligent behavior “emerges” naturally in consequence of the growth of the complexity of the model. The linear relationship between corpus size and model size has even been dignified with a scientific name: the “Chinchilla Scaling Law”, named after the Chichilla model discussed in this paperfrom Google Brain. The idea that there are such “scaling laws” for LLMs was originally suggested in this paper from OpenAI.

This is funny, in a grotesque sort of way. What is really happening here is that, as with all DL methods, without sufficient model capacity the model performance sucks. Then, as one grows the model capacity by adding parametrized computational elements, at some point the model becomes trainable, and begins to suck less. This “point of sucking less” is what LLM developers, in a brilliant stroke of marketing, have chosen to call “emergence”. And the advent of emergence is purportedly predicted by the scaling laws, which are felt to hold some deep significance about the nature of intelligence, and whose discovery rivals that of Newton’s law of gravitation in scientific importance.

This is, of course, pure horseshit. It is indicative of the corrupted state of the science of machine learning under the influence of the business imperatives of the Tech industry. A certain answer is required to be true by the industry: AGI is here, or, at least, nigh, which we know because of these “scientific discoveries”. The scaling law points the way to AGI: we will get there through larger models, more compute, higher CapEx. “Science says” that Hyperscaling will bring about AGI. And AGI will be so great that markets will emerge to absorb the $5T in investment required to get there.

This is a ridiculous story, since there is no credible scientific support for any of these claims. It’s a wild adventure in Madness of Crowds capitalism that cannot possibly end well.

What Have We Really Gained With LLMs?

The advent of the transformer architecture in 2017 represented a true breakthrough in natural language processing (NLP), one worthy of the highest scientific praise. Not because of any bullshit about AGI or “emergence” or scaling laws, but because transformers have demonstrated that natural language can be modeled at all. It was not clear prior to the transformer that this was true, or at least that it might ever be possible to model the distribution of human language text on a realistic computer.

There are two types of scientific impossibilities. The first type is that of things that are simply straight-up impossible so far as our current scientific understanding is concerned (faster-than-light travel, for example). The second type consists of things that, while not impossible in principle, are impossible in practice, because you if you tried to accomplish them you would bankrupt the World in the attempt and still fail (human spacefaring travel to nearby stars is in this category).

Some “type 2 impossible” problems have to do with whether something is or is not amenable to computational modeling. There are many problems that have known scientific principles, but which we do not expect to ever be able to model on a real computer. For example, first-principles numerical modeling of strongly turbulent fluid flow is just one such “type 2 impossible” problem, because no computer we can imagine building would have the capacity to resolve all the physical length scales required for such a simulation.

As to NLP, it has always been clear that language has patterns, and that text corpuses issue from some complicated distribution. It was not clear until 2017 however that computational modeling of that distribution was not “type 2 impossibile”. Now we know that it is in fact possible. That is a real scientific accomplishment.

Unfortunately, the realization of such models using transformers comes at a very daunting cost in compute, power, and capital. This is especially true now that the subject has become entangled with a totally unrealistic quest for a goal (AGI, or, at least, artificial reasoning) that is almost certainly not available along the current technological path. This cannot go on forever. And as economists are fond of saying, if something cannot go on forever, it will stop.

Are There Alternatives To Transformers?

Transformer-based LLMs have exhausted their usefulness as research tools. It is past time to start looking for alternatives.

A small number of leading researchers are already unhitching their wagons from the LLM caravan. Yann LeCun, one of the most celebrated DL scientists, made a splash earlier this year by parting ways with Meta to start up a research effort on an approach called “World Modeling”. Song-Chun Zhu, a renown expert in computer vision, has returned to China after 20 years of research in the U.S., to pursue a set of approaches that he characterizes as “Small Data, Big Task” (by implication, DL approaches have it the other way around).

I think it is promising that some very bright folks are breaking away from transformers (and possibly from DL altogether). As a matter of personal outlook, I am not sanguine about purely computational approaches such as World Models and SDBT superseding LLMs. These are, as I understand matters, still computational learning approaches. As I’ve written in past posts, I feel that the most serious defect of DL approaches is that they place little value on reasoning about data distributions, while focusing too much attention on models. In effect, in light of the Statistical Learning Catechism that I’ve expounded upon, they are computational models attempting to do a statistical model’s job, and as a consequence they make inefficient use of their data. That inefficiency is tolerable for most data types of interest, but unaffordable for human language learning. And while learning human language distributions is not sufficient to model human reason, I believe that without learning human language distributions there is no chance of any kind of emulation of the human faculty of reason.

So what I’d like to see, and something I am attempting to do in my own research, is to try to introduce principled statistical models to perform language-learning tasks. I can’t tell yet whether the things that I am trying will work as well as a transformer, or even be competitive with one. However, I do think that it ought to be possible in principle to construct a model whose capacity need not scale with data corpus size, and whose parameters are fixed in number by the structure of the human language distribution that it models. Those parameters should be determined more accurately the more training data the model is fed.

To accomplish something like this, not only must one break away from transformers: it is necessary to give up on DL methods altogether, because the overparametrization required by all DL-based methods makes language learning unaffordable.

This type of approach is in a sense less ambitious than what LeCun and Zhu are attempting to bring about, because it is still strictly concerned with language modeling. But this seems to me a good way to leverage the one solid, useful result that has emerged from modern NLP: natural language can be modeled on a real computer. Now we just need to find a way to do it that doesn’t bankrupt the World.


  1. Note that “generalization” here means “interpolation”. We can only expect such models to give good results on new data that is very similar to the training data (technically, “in the support of the training data”). In the extrapolation regime i.e. far from the training data, DL methods always give terrible (but very confident) predictions. You might call such predictions “hallucinations”.↩︎
  2. Roughly speaking, the uncertainty on the values of those parameters should scale inversely with the square-root of the number of independent samples.↩︎

Part 6: The Pathology at the Heart of HyperscalingPost + Comments (40)

Part 5: Hallucinations

by WaterGirl|  December 3, 20257:30 pm| 78 Comments

This post is in: Carlo Graziani, Carlo's Artificial Intelligence Series, Guest Posts, Science & Technology

Guest Post: AI 1

Guest post series from *Carlo Graziani.

On Artificial Intelligence

Hello, Jackals. Welcome back, and thank you again for this opportunity. Being able to write these posts on AI has been very helpful to me in clarifying and sorting out my thinking on this subject, and the comments that have followed each post have been of high quality and on point, making up excellent and informative (including to me) discussions.

The plan is to release one of these per week, on Wednesdays, with the Artificial Intelligence tag on all the posts, to assist people in staying with the plot.

I had originally planned to post a high-level summary during Thanksgiving Week, to try to offer usable take-aways to people put off by my nerd-babble. After some discussion with WaterGirl, we have decided instead to leave the summary posts to after the conclusion of the series.

Part 5: Hallucinations

In November 2022, Vancouver resident Jake Moffat needed to travel to Toronto to attend the funeral of his deceased mother. He asked an Air Canada chatbot about the terms of a bereavement fare, and the chatbot assured him, incorrectly, that according to the company’s rules he could receive the bereavement discount retroactively after traveling on the regular fare. When Air Canada denied Moffat the discount, he sued the company. The Tribunal held that Air Canada was liable for its chatbot’s representations to customers on its own website, and had to pay Moffat damages and legal fees.

In May 2023, a plaintiff attorney named Steven A. Schwartz filed a legal brief in the Southern District of New York containing references that the judge deemed to be “…bogus judicial decisions with bogus quotes and bogus internal citations.” Schwartz acknowledged that the source of the bogus references was in fact ChatGPT, representing to the judge that ChatGPT had, upon being questioned about the authenticity of the cases, responded that they were “real” and “can be found in reputable legal databases such as LexisNexis and Westlaw.”

In Spring of 2025, The Chicago Sun-Times published a 15-title summer reading list. Ten items on the list were made-up titles attributed to real authors.

Google’s AI Overview has recommended using non-toxic glue on pizza to help cheese stick to the pie.

show full post on front page

I could go on, but it gets boring. Finding examples in the media of AI going off the rails in embarrassing ways is easier than finding inebriated people on Chicago streets at noon on Saint Patrick’s day. Just try a web search on “AI hallucinations”. The AI hallucination is a daily phenomenon, affecting programmers attempting to speed up their coding, scientists looking for fast ways to generate or clean up papers and proposals, and anyone in need of text that must precisely reflect some legal constraints.

AI models are also notoriously bad at mathematical reasoning, making elementary arithmetic mistakes as well as serious mathematical errors. I have prompted ChatGPT to perform a certain standard physics derivation 1 twice now, at a distance of several months, and both times I have obtained careless stupidity that no undergraduate would be capable of producing, presented with professorial polish and total didactic aplomb.

It’s fun to point and laugh, but sometimes it is no joking matter. People have received bad, even dangerous medical advice from ChatGPT. There is a high-profile effort underway to use AI to “democratize” financial advice, which is seemingly innocent of the associated risks. There’s a pending patent for “AI Traffic Control” which is exactly as terrifyingly stupid an idea as it sounds. In fact, we are living through a moment in which the Tech industry is desperately attempting to propose AI for any application offering any prospect of profitability–no firm today makes any money on AI services–so it is not surprising to see such risks minimized or hidden altogether.

To observers of this discipline, the hallucination phenomenon is a very serious problem, and is another reason to question whether “Artificial General Intelligence” (AGI) is even a remote possibility on our current technological path. Certainly it would seem that if the hallucination issue is not understood and corrected somehow, any prospective AGI will babble hilariously, and possibly dangerously, some unpredictable fraction of the time.

The Tech industry consensus on hallucinations, however, is some combination of (a) hallucinations are not really a problem, and (b) more pretraining of improved (i.e. “larger”) models with more data at higher cost in compute and power will make them go away, as AGI finally emerges. I have had conversations with people who really believe in (a) or (b), and at least one person I spoke with appeared to somehow hold both views simultaneously.

View (a) is obviously not even worth discussing, given the high stakes involved in many AI applications. What I’d like to discuss today is view (b): can we really expect bigger models trained at higher expense with more data to do away with the stubbornly persistent phenomenon of AI hallucinations?

In order to address this question, we need to understand where these hallucinations come from. For that, we first need to review what it is that LLMs do.

What Does an LLM Do?

It is helpful to recall the basic definition of statistical learning–the subject that encompasses all of AI–at this point. Here is how all this stuff works:

  1. Take a set of data, and infer an approximation to the statistical distribution from which the data was sampled;
    • In the case of LLMs, the data consists of hundreds of billions of words of text
  2. At the same time, optimize some decision-choosing rule in a space of such decisions, exploiting the learned distribution;
    • With LLMs, a decision could be response to a text prompt, or a judgment about whether the text expresses positive or negative sentiment, or a translation to another language, etc.
  3. Now when presented with a new set of data samples, produce the decisions appropriate to each one, using facts about the data distribution and about the decision optimality criteria inferred in parts 1. and 2.

The reason that I keep bringing these up is that I find this model-agnostic view of the machine learning enterprise extremely clarifying, and helpful in directing attention towards what matters and away from irrelevant aspects of model design.

We should apply the above catechism to what LLMs do. Data from natural language text consists of sequences of words, interspersed with punctuation. LLMs learn features of the distributions over such sequences that allow them to probabilistically predict what the next response word should be, given a prompt and any response words previously supplied.

So, for example, suppose your prompt to be completed by the LLM is “Bob was nervous about his presentation to the board, despite his preparation the night before.” and the LLM completes it with “He had practiced by reading his slides and timing what he said while each one was displayed.” The LLM starts with the prompt as its context, and uses the learned distribution to compute the probability distribution of the next word. From that distribution, it samples (i.e. decides on) the word “He”. It then appends “He” to the prompt to form a new context, and calculates a new distribution for the next word. It turns out that “had” is pretty high in the probability list, and gets selected. The context is now “Bob was nervous about his presentation to the board, despite his preparation the night before. He had”. The LLM repeats the process, and probabilistically samples the word “practiced” from the new distribution. And so on.

No kidding. This is all that is going on. Next response token prediction based on the prompt and all previous response tokens. That is the entire trick. Neat, eh?

Back to hallucinations: there are two places to look for their origin: the approximation to the data distribution, and the next-token decisions founded on that distribution. Let’s take them in order.

Approximating the Distribution of Human Language

The reason that text comprising hundreds of billions of words are required to train an LLM is that the statistical regularities of human language are extremely complex, and not easy to capture in a principled statistical model.

Just scroll your eyes up and down this essay briefly, and then imagine figuring out the rules by which the words are juxtaposed, without being detained by trivialities such as meaning. There are rules: you rarely see the same word repeated immediately (e.g. “immediately immediately”) which is clearly a rule. There are grammar rules, and context rules. Certain clusters of words recur together in certain types of text and not in others: you will find pairings of “octopus” and “cephalopod” within a few hundred words of each other in texts from works on marine biology, but pairings such as “octopus” and “mortgage” are probably very rare. In fact, the occurrence of “octopus” in a page probably means that the probability of encountering “mortgage” within the next 1000 words is considerably reduced from the average rate of occurrence of that word, while the occurrence probability of “shark” is likely enhanced. And so on. How would one go about describing these patterns?

The approach used in natural language processing (NLP) since time immemorial is to begin by breaking down the text into tokens, then describe the text as a sequence of such tokens. This tokenizationis a subtle and arcane art. You might think that it would be logical to break things down into words, numerals, punctuation, etc. While not wrong, this approach is very inefficient. The problem is that the English language (say) has about 500,000 words, which is a huge vocabulary for an LLM to manage. Vocabulary size is a critical parameter to be managed in this game, because the larger the vocabulary, the larger and more expensive the model.

On the other hand, breaking things down into individual letters is also a bad idea. While the vocabulary size is now much smaller (less than 50 for English), the token sequences are much longer, and the patterns much harder to find. The patterns are at the word level, not the letter level. It’s just that there are so many damn words!

The secret sauce is to notice that most of that half-million words are extremely rare. Studies of natural language have shown that knowing 10,000 words in any language allows one to understand 99% of texts in that language. In English, that would be 2% of the full vocabulary. Moreover, the rare words can be built up out of smaller word pieces. Identifying an optimal set of word pieces, most of which are full words in their own right, is the name of the game here. Algorithms exist that can represent all English text using 30,000 to 50,000 tokens, which is a considerable savings in vocabulary size. So tokenization is (largely) a solved problem 2.

Embedding

The next thing that essentially every NLP method does with its tokenized text is a process called embedding: Each token is mapped into a vector space of dimension about 1000 (basically, each token gets described by a list of 1000 numbers) endowed with a notion of distance between points similar to the notion of distance between points in 3-dimensions. At this point, all operations on a sequence of tokens become operations on such lists of numbers. So when a transformer operates on a sequence of tokens (a sentence or a set of sentences, including previously-generated text) that sequence gets embedded in a very high-dimensional space: for example, the prompt above (“Bob was nervous…”) consists of about 17 tokens, so it is mapped to a point in an approximately 17,000-dimensional space consisting of 17 copies of the original 1000-dimensional embedding space.

I want to draw attention to embedding for several reasons: one is that it is an essentially universal practice in NLP, preceding the invention of transformers by many years. It turns out to be much easier to model probability distributions by operating on lists of numbers than by operating directly on sequences of discrete tokens sampled from a finite-dimensional set (the vocabulary). So researchers have defaulted to the embedding strategy.

Another reason to emphasize embedding is that when transformers train embedding parameters, they appears to do something magical: the resulting embeddings cluster together words and word fragments with similar meanings or functions, in well-separated clusters in the embedding space. You can see examples of this at Kevin Gimpel’s Bert Embedding Visualization Page, where you will see visualized in 2-dimensions clusters of suffixes, of verbs with similar meanings, of types of enclosed spaces, etc. It is one of those weird effect that persuades some people that LLMs are in fact acquiring a sense of the meaning of words.

The final reason to draw attention to embedding is this: embedding almost certainly poison the approximation to the distribution of language tokens. The embedding step destroys information about that distribution. The reason is that the original native space of token sequences is entirely innocent of vector spaces, and contains no geometric notion of spatial proximity such as arises in the embedding space. That spatial proximity structure is entirely imposed by the NLP architecture. And it almost certainly gives rise to improper notions of proximity between sentences that are sensible (i.e. have a high probability of occurrence) and other sentences that are nonsense (i.e. have a low probability of occurrence).

As an example of improper proximity, consider these two brief sentences: “My dog is fast”, and “My sparrow is fast.” Both are well-formed, grammatically correct, and obey applicable syntactic and semantic rules. The difference is that the first sentence ought to be ascribed a much higher probability than the second one, because nobody actually owns a sparrow.

As embedded points, however, the two sentences are quite similar: a dog and a sparrow are both animals, and hence live in some proximity in the embedding space. Furthermore, dogs are pets, and while sparrows are not pets, they are birds, and some birds are pets. There are enough ways to draw proximity connections in the embedding space to make the second sentence seem plausible in the distribution approximation, despite the fact that it is, obviously, a hallucination.

So embedding is, in my opinion, one of the origins of hallucinations. It is the reason that the approximation made by LLMs to the distribution of language is so brittle. There is nonsense lurking “near” sense in the embedded sequences, because in their native space (token sequences) there was no notion of geometric “nearness”: that property of relative proximity is an artifact of the model.

And if true, this is very bad news for AGI, because it means that hallucinations are a structural feature of all LLMs. They all embed sequences. So you cannot just train your way out of insanity and into “General Intelligence”, because all those new tokens will have exactly the same problem of spurious proximity. The distribution will be corrupted from the outset. It may be that the most likely responses could appear sane, but insane responses will always lurk nearby, waiting to be sampled by the LLM.

The Tyranny of Sampling

I’ve been referring to the process of “sampling” tokens above, and I should say a bit more about that, because while we have seen the origin of hallucinations in a broken estimate of the distribution of language sequences (part 1 of statistical learning) we need to see how the problem is aggravated by an LLM’s response decisions (part 2).

LLMs are often referred to as “generative” models (the “G” in “GPT”). What this means is that their output is, in a sense, random rather than deterministic. They compute probability distributions over the next token, and then exploit that distribution to decide what the identity of the next token should be. They generally do this by choosing the token randomly, with a higher probability of selection ascribed to tokens judged more likely by the calculation.

You might well ask: “Why not simply select the next token by choosing the one with the highest probability?”

This is occasionally tried. It is a strategy called “greedy sampling”. It is very efficient. Unfortunately, it is also a recipe for disaster, a ticket to hallucination pandemonium.

The problem is this: what one really wants is the most likely extended response, to the prompt, according to the learned distribution over language. This might consist of hundreds, or thousands of tokens. The distribution, while imperfectly learned, appears to at least get the most likely extended response right, in the sense that it is the one least likely to contain a hallucination.

Unfortunately, sampling the most likely next-token at every stage does not produce the most likely extended response. This can be a surprise at first, but from a mathematical standpoint it is not surprising at all. The probability of the 17th next-token conditional on the prompt and the previous 16 next-tokens can be very different from the probability of the 17th next-token conditional on the prompt and on entire remaining most-likely response (tokens 1-16 and 18-1000, say). Choosing the most likely token at every stage can, and usually does, lead the LLM into crazy rabbit-holes.

So instead, one attempts to let the probability distribution do its thing by allowing it to somehow sample the next-token distribution. This is better, but more expensive. In principle, what one ought to do is sample the 1000-token response many times (10,000 times, say) and choose the most frequently-occurring response. That strategy would probably abate a good deal of the hallucination phenomenon. Unfortunately, it would be totally unaffordable in inference computation cost, as well as quite slow. So intermediate strategies are adopted, restricting the next-token distribution to the top 90% of candidates, and looking along a tree to the next and next-next tokens for each one of these top tokens (the so-called “beam search”). This is better, but still not great for finding the top 1000-token response.

You might call this the Tyranny of Sampling: one must somehow sample from an LLM in order to defend its output from the worst hallucinatory offenses. But if you try to do the right thing, the computational cost will destroy the usefulness of the method. Rock, hard place.

Hallucinations Are Structural

Here’s the bottom line: Hallucinations are a structural feature of LLMs, produced by a corrupted model of the probability distribution over language sequences learned in training. The corruption is due to embedding, which is a ubiquitous feature of LLMs.

The only available hallucination abatement strategy is some form of generative sampling, which means accepting the unsettling fact that LLMs cannot produce the same output twice to the same prompt. And even accepting this non-determinism as a cost of doing business, the sampling strategy that cleans up the problem to a maximal extent is totally unaffordable. Unsatisfactory look-ahead strategies are better than nothing, but they still let a lot of nonsense through.

There is no hallucination abatement strategy that begins with more token data and larger models. That’s just not a thing, despite what the Tech industry would like to believe (and would certainly like investors to believe). More tokens and larger models likely aggravate the embedding problem, because there will be more improper proximities discovered in the embedding space.

And note that “larger” models are not “more clever” models. This discipline has not produced radical innovations to the transformer architecture since its invention, or at least none that have led to any breakthroughs comparable to what was wrought by the transformer’s first introduction in 2017. A “larger” model simply means “more parameters”3, not new mechanisms that make the model more clever. Given the argument that I make here, I very much doubt that any new cleverness could be built into a transformer that could eliminate the hallucinationatory mechanisms baked into its structure at its most fundamental level.

All of which is to say this: LLM-based “AGI” will be mentally ill at birth.


  1. “Derive the wave equation starting from Maxwell equations.”
  2. Note that optimal tokenization is not a solved problem. This would be addressing the following problem: what tokenization is maximally preserving of the information borne by text?
  3. Which is to say, more embedding parameters, more attention heads. Not new mechanisms.

Part 5: HallucinationsPost + Comments (78)

Part 4: If There Were AGI, What Would It Look Like?

by WaterGirl|  November 19, 20257:30 pm| 65 Comments

This post is in: Carlo Graziani, Carlo's Artificial Intelligence Series, Guest Posts, Science & Technology

Guest post series from *Carlo Graziani.

Guest Post: AI 1

On Artificial Intelligence

Hello, Jackals. Welcome back, and thank you again for this opportunity. What follows is the fourth part of a seven-installment series on Artificial Intelligence (AI).

The plan is to release one of these per week, on Wednesdays, with the Artificial Intelligence tag on all the posts, to assist people in staying with the plot.

The original plan was to skip Thanksgiving week. However, I’ve been talking to WaterGirl about the technical level of these posts, and I’ve come to realize that it’s been a bit off-putting to some readers. So I think that during the turkey-day break, I’ll try to provide a high-level summary of where the series has been with an eye to keeping the nerd-babble under control.

That said…

Part 4: If There Were AGI, What Would It Look Like?

Part 3 ended with a bit of a rant, because I felt the need to express outrage at the very loose and lazy intellectual standards prevailing in much contemporary “AI” research, at least insofar as discussion of artificial General Intelligence (AGI) goes. My perspective on the subject is by no means a majority view, and I feel a little like Diogenes, shaking his fist at the corrupt world from the austerity of his barrel.

The thing is, I don’t really enjoy the role of Diogenes, because “burn it all down” is a fundamentally destructive outlook on such things. I happen to feel that the scientific accomplishments of modern machine learning, while often oversold, are very real. I don’t want to give the impression that I think the entire subject is worthless, just because the current scientific discussions of AGI are so fundamentally wrong.

As to AGI itself, I think there is something else I need to clarify: I do not intend to say that it is impossible to achieve some version of AGI: I am simply saying that AGI is impossible along our current technological path, which is to say, based purely on machine learning techniques.

I am philosophically a materialist. I do not believe in souls. I think that consciousness is something that physical brains do, a phenomenon that arises from the electrical activities of billions of grey cells. And that being the case, I cannot in good conscience believe that it is surely impossible to bring about some kind of entity, in software running on computer hardware, that recognizably emulates aspects of human cognition, including reason. I do expect that this feat will be far more challenging to accomplish than chatbot parlor tricks we currently call “AI”. Even if true AGI is possible at all, we might not see it happen for many decades. Nonetheless, fundamentally, some AGI technology should be possible in principle.

What I want to attempt today is to describe what the scientific basis for such a technology might look like. I base this discussion on an article that I have written that is currently under review (those of you who would like to take a deeper dive will find the draft article here).

This is a purely speculative venture, and what I write here, however well-motivated, could easily turn out to be wrong. Nevertheless I think this is a useful exercise, for two reasons: it is useful to at least try to point to a possible exit from the stagnant state of current research on AGI; and, it is useful to at least try to illustrate what type of research concerns ought to replace those currently occupying scientists working on AGI.

What Should We Require Of A Theory Of Artificial Reason?

show full post on front page

I want to narrow down these considerations, from AGI (a term for which no accepted scientific definition exists) to artificial reason, which is at least amenable to some specific discussion. What I would like is a model of what we mean by the term “reason” that is specific and detailed, to the point of being amenable, at least in theory, to implementation as software. Such a model would at least get us away from the territory of bullshit claims such as “self-organization” and “emergence” of AGI.

Last week, I discussed human reason in the context of what sort of traces it might leave in natural language text, to examine the plausibility of claims that reasoning states can be recovered from large text corpuses. I pointed out that our own reason rests on a foundation of subrational processes which almost certainly leave no such trace in text. Cognitive scientists have only the vaguest notions of how those processes work, and they can certainly not exhibit any models for them that are sufficiently specific to be represented as software. So trying to build a principled “bottom-up” model that mimics how reason emerges in a human mind is probably hopeless, at least for now.

What is left, then, is a “top-down” approach. What I mean by that is that we must work at an abstract level rather than at a mechanistic one. We must state what we mean by “reason” in general terms, in a way that we cannot directly show to be connected to the mechanisms of human reasoning, but which is motivated by the structure of reasoned thought. Also, we would like a model that can be expressed as mathematically as possible, because the point here is to come up with something that we could imagine translating into computer code.

Oddly enough, we already have one aspect of human cognition that can be represented this way: learning. We have seen that there is a subject called statistical learning, wherein by some method one learns an approximation to the statistical distribution from which some dataset was sampled, and one concomitantly learns to structure reasonable decisions based on that distribution. I’ve been a little vague about how this works, but it is a process that can be represented quite generally by the kind of model that I have in mind here.

So one possible approach (certainly not the only one!) is to take that representation of learning and generalize it, to represent reasoning. This approach has two advantages: it allows us to get a free ride on the existing model, which appears to work for learning; and, it allows us to connect and contrast “reasoning” to “learning”, so that we can begin to see what the relationship might be between the two.

A Cast Of Characters

This is all very abstract, and it will be helpful to provide concrete examples of reasoners (or alleged reasoners) to consult as we go along. I have three such examples for you:

  • The astrophysicists who were trying to puzzle out the nature of Gamma-Ray Bursts (GRBs) between 1973 and 1998. The GRB phenomenon consists of bursts of gamma rays (duh!) that arrive at the Earth from random directions on the sky, never repeating. When they were discovered, and for the quarter-century that followed, their nature remained mysterious, because they seemed unconnected to any other astronomical phenomenon. The available data consisted of gamma-ray “light curves” (time traces of gamma-ray intensity), spectra (distributions of gamma-ray energies in the burst), event durations (fractions of a second to hours), and locations in the sky. The latter were only known very inaccurately: the so-called “error boxes”, regions of the sky from which the events might have arrived, were very large by astronomical standards, many degrees across, because it is difficult to create direction-resolving instruments for photons at gamma-ray energies. We will use the story of how the mystery of GRBs was solved to illustrate an aspect of our model of reasoning.
  • A DIY home electrician (name redacted to protect the guilty) attempting to install a light fixture into an electrical box. He is following very standard procedures, using techniques, tools, and materials that he has trained to use and understand, and is moderately skilled. However, for some unknown reason, the fixture installation is failing, because of a persistent short-circuit that only manifests itself when the fixture is finally secured to the wall, and the circuit breaker is turned back on. When he turns on the circuit breaker with the fixture not secured to the wall, there is no short circuit, and the fixture works correctly. He is trying to figure out why by inspecting wire nut connections and checking for crimped wires. We will use this story to illustrate another aspect of the our model of reasoning.
  • An LLM undergoing training, or a trained LLM making new inferences. It doesn’t reason: it’s just along for the ride.

Bayesian Updating As A Model Of Learning

Let’s get started with learning.

We can exhibit an abstract model of learning using Bayesian statistical theory. I’ll describe how this works without writing down any equations (there aren’t that many equations, and you will find them in that draft article if you care about that sort of thing). There are two elements to consider: a parameterized model, and an evidence stream.

The role of the evidence stream is to provide new information to be assimilated. The evidence is presented sequentially, one discrete piece at a time. It comes from a fixed set of possible pieces of evidence. There may be infinitely-many such pieces, but they are related by some structural relationships.

Examples of such evidence streams are GRB light curves, spectra, durations, and arrival directions; or the results of the DIY electrician’s inspections for faulty wire connections or crimped wires; or pages of text presented to the LLM in training.

The role of the parameterized model is to provide a description of the structure of the evidence. “Parameterized” simply means that the provided description is controlled by a set of numbers (the parameters) that act as control knobs on the model. Twist those knobs, and the model’s description of the evidence structure changes. There may be a half-dozen such knobs, or there may be billions, depending on the model and the evidence. The model is fixed, but we may set the knobs any way we choose.

The model might contain statements such as “the source of the GRB is a neutron star in our galaxy” and the corresponding knobs could be the star’s spin rate and magnetic field intensity, and its distance from Earth; or the model could contain the statement “one of the wires is getting crimped against the box’s mounting strap” and the corresponding knob would be the identity of the offending wire; or the model might be the LLM itself, and the corresponding knobs would be the billions of parameters that must be set in training.

We do not initially know which settings of the knobs provide the highest-fidelity description of the evidence structure, i.e. which settings are most predictive of the evidence. However,once we start viewing evidence, we have a procedure for weighting the knob settings. “Weighting” means that we may view some settings to which we have ascribed higher weights as being more likely than other settings with lower weights, because the higher-weight settings provide better descriptions of the evidence.

This weighting procedure is called Bayesian updating. As the model views each new piece of evidence, this (fairly simple) mathematical procedure describes how the weights shift among the knob settings. Generally speaking, a single piece of evidence produces a relatively small adjustment of the weights. Over time, as evidence accumulates, what may happen in the ideal case is that a small set of knob settings will hog most of the weight while remaining settings will have essentially zero weight, and we will conclude that those highly weighted settings are “preferred” by the evidence (in the sense that they give the most satisfactory predictions of the evidence).

That, in a nutshell, is our model of learning.

When Learning Stalls

One problem with statistical learning is that the happy circumstance where the weights contract to a small set of knob settings can be difficult to obtain. There are two possible problems with it:

  1. The evidence may not shed enough light on the model. In this case, we would say that the evidence is not informative about the model.
  2. The model may not be sufficiently descriptive of the evidence. In this case, we would say that the model is not explanatory of the evidence.

If either of these circumstances holds, the Bayesian updating process will stall, and the weights will not decisively concentrate on a winning set of knob settings.

In the case of GRB astronomy, a consensus developed in the 1980s that there was a Case (1) problem: the evidence was not informative with respect to any proposed model of GRBs. The problem was that the source location error boxes were too large, and too-tardily reported. It was felt likely that the transient GRB phenomenon was in all likelihood associated with equally-transient phenomena at other wavelengths, and that observing such transients might be the key to unlocking the mystery. But a 4-degree error box on the sky is always crowded with astronomical sources, including time-varying ones, and it was simply not possible to identify any one of them as the culprit. GRB research stalled. Bad evidence!

In the case of the DIY electrician, something was clearly not right with his understanding of the situation inside the box, because after multiple inspections it was increasingly clear that all the connections were fine, and none of the wires were getting crimped. Something else, not suggested by the model, had to be at fault. Bad model!

In the case of a trained LLM’s efforts to respond to prompts, we mostly have a bad model problem, in my opinion. Certainly, the hallucination phenomenon suggests a very brittle model that easily goes off the rails. However, depending on the objective of the training, there might also be a bad evidence problem, particularly in the case of training an AGI: as I discussed last week, the text corpus almost certainly contains no information concerning the origins of human reasoning processes.

Where’s The Aha! ?

Note one characteristic feature of the learning process that I described above: it is in essence continuous. Piece of evidence comes in, small adjustment occurs in weights. Lather, rinse, repeat.

If we are going to base an account of reason on straight up learning, as the LLM research community is attempting to do, this is a very serious (although largely unrecognized) problem, because one of the salient features of reason is that it often operates discontinuously. We have all, I am sure, experienced those moments of “Aha!” revelation, in which suddenly some issue that we have struggled with suddenly seems easily solvable. The problem has suddenly flipped and twisted in such a way that clarity replaces darkness. If there is an aspect of reason that distinguishes it from other cognitive activities, I submit that “Aha!” is that aspect.

That’s the problem with the “learning to reason” approach to AGI. Learning is an essentially continuous process. It simply cannot produce the “Aha!” discontinuity. There is no pure learning path to Artificial Aha! (AA). As a type of cognition, learning is severely limited by restrictions on evidence and model choice. Essentially, all it can do is update its weights across the fixed model’s knob settings, based on evidence drawn from a fixed collection of evidence types, in the hope that some settings are explanatory of the evidence and that the evidence is informative about the model.

It should go without saying that this does not begin to capture reasoning. Anyone reflecting on their own occasions of “Aha!” moments of sudden clarity and insight (not necessarily in the pursuit of natural science, home repair, or computer science, but in solving any puzzle in any field of human activity!) should understand that those moments do not come from a process analogizable to gradual constraining of a model through gradual assimilation of accumulating data. “Aha!” moments are essentially cognitive discontinuities, gestalt shifts that suddently alter the process of assimilating evidence into a model, and are incompatible with the continuous learning process described above. So what are we talking about when we talk about “reason”, and in what way is it related to learning? And, how might we produce AA?

Evidentiary Reform

Suppose that we recognize that we are in Case (1): the evidence is not informative of the model. Then the move is obvious: we change our evidence stream. We cast about for a new stream of more powerful evidence that speaks more clearly to our model, using our knowledge of model features that might be sensitive to other types of evidence, as well as of what new types of evidence might be feasibly acquired. We refer to this shift as Evidentiary Reform.

Evidentiary reform is pretty much the approach taken by astrophysicists to decode the nature of GRBs. Realizing that no GRBs could be associated with a transient counterpart in other wavelengths because of the inaccuracy of GRB locations, GRB scientists developed new high-precision X-ray localization instruments, and arranged for GRB locations to be propagated in real time to ground-based optical and radio observatories. The first transient optical counterparts of GRBs (the so-called “afterglows’’) were detected in 1998, revealing their extragalactic nature through their substantial absorption redshifts1. By 2003 a core-collapse supernova in a relatively nearby galaxy had been caught in flagrante in a GRB error box (whose size was now about 0.05 degrees), associating GRBs with a certain type of supernova. Case closed. The new stream of evidence, brought into being to correct the weakness of the previous evidence, transformed the mystery into a soluble problem.

The ability to propose evidentiary reform to obtain better model constraints is certainly an example of a true reasoning process. It has the required “Aha!” discontinuous character, embedded in the realization that a new type of information is required for further progress. It is also a highly non-trivial thing to model in a computation, since a successful evidentiary reform needs to take into account not only the nature of the weakness of the previous evidence with respect to model constraint potential, but also practical considerations of how such new evidence can be obtained given real-world feasibility constraints.

Model Reform

Suppose that we recognize instead that we are in Case (2): the model is insufficiently explanatory of the evidence. Then, again, the move is obvious: replace the model with a new model capable of improved predictive power, and endowed with a new set of knobs. The new model might be suggested by the specific form of prediction failures common to the old model. It would likely also satisfy certain criteria of ontological parsimony, embodying some notion of Occam’s Razor-type simplicity so as to exclude model families of weak explanatory/predictive power. We will refer to this process as Model Reform.

The DIY electrician took this approach to finally figuring out his short-circuit problem. After several iterations of taking the fixture off the box and inspecting various electrical elements and connections for defects, and making sure the wires were neatly folded in the box so that they could not become crimped, he started to think of what could produce a short-circuit only when the fixture was secured to the wall. At which point, he realized that the screw securing the fixture to its mounting strap in the electrical box was long enough to reach through the box into the hole in the wall from which the electrical cable emerged, and bury itself among the wires in the cable, potentially crimping and shorting them. And an inspection of the end of the screw showed a dark discoloration that was not present originally, presumably due to the short-circuit passing through the end of the screw. A simple solution—replacing the screw with a shorter screw—immediately produced a satisfactory installation. The problem had been that the original model did not feature any role for the mounting screw. The new model now contained a statement “The mounting screw causes a short at the electrical cable when the fixture is fully secured.” It was induced by the inability of the original model to predict the short-circuits, and supported by new evidence (the discoloration of the end of the screw) which was not interpretable within the the original model.

A reasoner can produce an “Aha!’’ discontinuity through model reform, when a judicious replacement of the model results in improved predictions of the evidence, leading to marked improvements in the concentration of the knob settings weights. Again, this type of reasoning is not straightforward to model in a computation, since formulating a new model requires some sense of the data misfit and a formulation of some kind of Occam Razor conceptual parsimony constraint.

Reasoning and AI

In summary, this high-level account of reason ascribes to it the ability to supervise and intervene upon a learning process, discontinuously altering either the model or the evidence stream, which would otherwise be static features of the learning process. In addition, a reasoning process must be capable of recognizing when a learning process under its supervision stalls. When a stall occurs, it must diagnose whether the failure is more likely due to a bad model or to a bad evidence stream, and it must propose an alteration of one or the other, according to criteria suggested by the failure, while respecting important constraints on possible alternatives.

In other words, in this account, reasoning transcends learning in an essential manner.

This is major trouble for current attempts to obtain AGI, because, as we have been discussing, the entire subject is based on machine learning. Transformer-based LLMs are nothing but computational models that learn to represent an approximation to the probability distribution over token sequences encountered in their training data, which they exploit to construct likely sentence completions, sentence translations, sentence classifications, and so on. They do this so well that their output can belie its origin in probabilistic mimicry (in Emily Bender’s memorable phrase, they are Stochastic Parrots). They can produce the appearance of reasoned discourse at most times. But the process by which such models are trained is the gradual, continuous assimilation of millions of text documents into a stupefyingly large model. LLMs never do “Aha!” They simply aren’t wired that way, becuse their evidence streams and models are fixed.

This is the point that current AGI research appears to miss altogether. The view now gaining currency among practitioners is that the “emergence” of intelligence occurs in consequence of training models with billions, or trillions of parameters, as evidenced by the fact that such models can perform certain “reasoning tasks”. But performing reasoning tasks is not at all the same thing as reasoning: that is the circular argument for AGI again. Some modern AI system have been trained to write very creditable computer code. But the ability to write code does not make one a computer scientist—there are no AI computer scientists today, certainly none capable of proposing new conceptions and models. Similarly, some AI systems can prove mathematical theorems. This does not make them mathematicians, since there is much more to the cognitive activities of a mathematician than just proving theorems—it is far more challenging and useful to know which theorems are interesting to search for, and to create interesting new mathematical frameworks within which theorems can be searched for and proven. And, from the sublime to the ridiculous: an LLM-based AI electrician may know chapter and verse of the National Electrical Code, and be as conversant with tools, materials, and techniques as any licensed electrician. But faced with a situation not previously confronted by any training example it would not be able to reform its model or its evidence stream to suit the unexpected circumstance.

Is This Model Right?

I don’t know whether the model of reason that I argue for here is indeed correct, or in any sense valuable. It has obviously not been implemented in software and validated. As I have indicated, it would be highly non-trivial to represent the model in software. But not, I think, impossible. It is at least a specific model, and it is based on a set of mathematical ideas. One could at least begin building small toy systems that would permit some exploration of its features.

I imagine that AI practitioners would find it easy to reject this model and ignore the conclusions that it forces one to draw, because there is no output that one can judge it’s validity by. But please note that at least this is a model of reason. AI researchers have never deigned to supply such a model, instead relying lazily on vague notions of “emergence” and “self-organization” for which they offer no mathematical theory worthy of the name. Which is to say, they embrace the circular argument for AGI, discovering AGI in LLM output after declaring what AGI should appear as in LLM output. That is a worthless, contemptible scientific argument (Diogenes is getting the better of me again). If you want to tell me that your model “reasons”, show me your model of reason, and we can argue about whose model is better. I would love to have that conversation. It would be on a whole different intellectual plane from where AGI research is today.


  1. The universe is known to be expanding, so that very distant galaxies appear to recede from us at velocities that increase with their distance from our own galaxy. This effect leaves a trace in light that we detect from such galaxies, because the faster they move away from us, the more their light is shifted from lower to higher wavelengths, i.e. from blue to red. This “redshifting” is known as the Doppler effect, and it helps astronomers ascertain how long ago the light was emitted.

Part 4: If There Were AGI, What Would It Look Like?Post + Comments (65)

Part 3: There Is No Artificial General Intelligence Down This Road

by WaterGirl|  November 12, 20257:30 pm| 83 Comments

This post is in: Carlo Graziani, Carlo's Artificial Intelligence Series, Guest Posts, Science & Technology

Guest post series from *Carlo Graziani.

Guest Post: AI 1

On Artificial Intelligence

Hello, Jackals. Welcome back, and thank you again for this opportunity. What follows is the third part of a seven-installment series on Artificial Intelligence (AI).

The plan is to release one of these per week, on Wednesdays (skipping Thanksgiving week), with the Artificial Intelligence tag on all the posts, to assist people in staying with the plot.

Part 3: There Is No Artificial General Intelligence Down This Road

This week and next we will be taking a close look at the claims made by the Tech industry that there are already indications that Artificial General Intelligence (AGI) is “emerging” in large language models (LLMs), and that true AGI will be a reality within the next few years. Keep in mind that AGI is the objective that these companies are targeting, and its realization is the essential justification for the roughly $2T investments in “AI” model development that the industry now projects over the next 5 years or so.

You might think that to justify that level of investment would require a pretty airtight scientific case that (1) AGI is possible in principle, and (2) that AGI is achievable through current LLM technology, which is to say, using transformer-based deep learning (DL). But if you did think that, you would be wrong. Whether AGI can be accomplished at all has been an open question since the 1930s. And, as I will argue in this essay, we are certainly not any closer to AGI with current “AI” tech than we were before the DL revolution began.

The Circular Argument For AGI

The first thing to observe is that there does not really exist a scientifically-defensible definition of what AGI is. There is a fairly balanced review of the topic here. The principal problem is that we don’t even know how to accurately describe or define either the mechanisms or the characteristics of human intelligence, so when definitions of AGI appeal to notions such as “the ability of computers to perform human-like cognitive tasks” they are comparing one imprecise notion to a different imprecise notion.

Moreover, it is important to note that all such definitions are circular: they define AGI in an LLM in terms of certain types of output produced by LLMs, and then promptly discover evidence for that very output, proving that AGI is near. This paper, Sparks of Artificial General Intelligence: Early experiments with GPT-4 is an unintentionally hilarious example of the genre.

I find this sort of thing extremely frustrating. Language matters in science. I don’t want to have to parse statements that amount to defining what intelligence looks like in text output, from people who don’t have the faintest idea what intelligence is.

Cognitive scientists also labor under this constraint, designing tests and experiments to try to understand aspects of human cognition from stimuli and responses. But they have no choice in the matter: we are very far away from having experimental access to the higher-level functioning of the human brain, so those scientists use the tools that are available. Computer scientists have no such excuse: they have complete access to and control over their models. Nonetheless, the tests for intelligence that they adopt are essentially stylized versions of the cognitive science tests, with stimulus and response replaced by prompt and response. There is no effort to describe what aspect of transformers (or the chained, augmented transformers in the “reasoning” models of OpenAI and others). There is only complacent satisfaction that some combination of pre-training, fine-tuning, distillation, computational scaling, iteration, etc. produces improved performance on “reasoning” benchmarks. Sure, that’s very nice, although “improved” does not mean “adequate”, according to ARC-AGI-2 testing. But excuse me, what isthis “reasoning” of which you speak?

I’ll have more to say about reasoning next week. For now, I just want to point out that whatever reasoning is, it is certainly a distinct cognitive process from learning. So the assertion that reason can “emerge” from what are pure statistical learning systems is a huge claim, one whose justification would require mountains of really impressive scientific evidence, including a detailed explanation of the mechanism by which it arises in LLMs or chains of LLMs.

show full post on front page

The Implausibility of “Learning To AGI”

In order to break down the claim into intelligible pieces, it is useful to adopt the “model-agnostic” outlook on machine learning that I discussed last week. Recall that in that outlook, we draw a veil over the details of the machine learning implementation, and focus on learned distributional structure of training data and on optimality of decision choice. In this case, the training data is vast amounts of text distilled and cleaned and curated, from large-scale Internet scrapes, from large libraries of scanned books, from academic journals, and so on. The decisions are responses to prompts. Whatever the thing behind the veil is, what it does is learn an approximation to the distribution of texts, and approximately optimal responses to prompts.

I need to introduce a concept here that is familiar to most scientists: it is the idea of an inverse problem. The problem is this: given that some data resulting from observations of some process, infer certain attributes of that process. A simple example is weather prediction: given a time-series of observations of weather conditions at thousands of weather stations, and radar and other remote observations, recover an approximation of the current full state of the atmosphere, so as to evolve it using a numerical weather model to predict whether it will rain tomorrow. Another famous (and essentially unsolved) example is from epidemiology: given some time-series of data on infections, hospitalizations, and deaths due to COVID-19, say, infer the current state of the epidemic (how many people are susceptible, exposed, infected, recovered, immune, on a county-by-county basis), and use a numerical epidemiological model predict the epidemic’s future course.

Note the essential elements of such problems: we have a principled model of the process (a numerical weather model, or an epidemiological model) whose state we would like to infer (the atmospheric state, or the state of the epidemic) using data (weather observations, clinical data) so as to make predictions (will it rain during my picnic, is there a new epidemic wave in progress). There is always an assumed “forward model” that describes how the observed data arises, given the state of the process. But that state is unknown, and to estimate it from data one must in some sense “invert” the forward model. Hence “Inverse Problem”.

The process model plays a key role. You need to have some idea of how the process works—a set of equations that governs the process, for example, depending on unknown parameters that you need to infer—for there to even be a well-posed inverse problem. That’s not a sufficient condition, but it is certainly a necessary one.

Inverse problems are ubiquitous in science. In fact, one could, after a few beers, make the claim that most of the daily activities of scientists revolve around solving inverse problems. This is not completely true (where did the principled process models come from, in the first place?) but it is not a grotesque caricature either.

We can view the training of an “AGI” in inverse problem terms: the data is the oceans of text that these things ingest. The process model is the transformer-based “reasoning” model. The “state” to be inferred is the parameter configuration of that model that closely corresponds to a representation of the mental state of a reasoning human. The predictions are reasoned responses to prompts.

OK that’s all I need. Here is the problem: in order to believe that LLMs are achieving “reason” (the minimum requirement for any definition of AGI), we need to accept two big claims:

  1. Whatever a reasoning process may be, it leaves a sufficiently informative imprint of its internal state in text data, such that the state may be in some sense recovered and exploited, given a sufficiently large corpus of text, by solving the corresponding inverse problem.
  2. Transformer-based LLMs, in some sense, play the role of the process model in this inverse problem, and training such an LLM is tantamount to solving the inverse problem. Moreover, the trained LLM embodies the resulting reasoning entity to the point that at inference time it actually reasons.

Let’s take these in order:

In my opinion, claim (1) is barely sane. Perform any sort of introspection, and I think it is likely that you will find that your spoken or written utterances embody only the most superficial layers of your reason and other cognitive processes. That’s why we all struggle to put our thoughts into words when the occasion arises. We often are not even clear about what our thoughts are, and find, after putting them into words, that they have changed, possibly getting clearer, but also often becoming murkier and less certain as we are forced to articulate our meaning1.

I simply cannot understand how such subrational processes might embed any interpretable information in our utterances. It is analogous to believing that, given a full, principled model of human physiology, and a data corpus of human footprints together with clinical observations of the humans leaving the footprints, one could train a model that could observe a new footprint and predict the health of the corresponding human. That would be mad: there is not enough information embedded in a footprint to back out a person’s gastric health, or vision acuity, or state of infection from a disease, etc. Similarly, I do not believe that there is enough information impressed in text about the subrational processes whose surface manifestations we call “reason”. I could be wrong about this, but I don’t think so, and in any event the burden of proof is on those researchers who make this kind of claim. Where is that information? How is it encoded?

Claim (2) is actually much worse: it is in the category that physicist Wolfgang Pauli called “not even wrong”—a statement so detached from scientific discourse that classifying it as correct or incorrect is simply a waste of time.

Let’s pull back the curtain concealing the LLM model for a moment. If you read any of the many online descriptions of how a transformer works (The Illustrated Transformer is pretty good, and Wikipedia’s is quite detailed, but Google has many hits for “How does a transformer LLM work”), you may find the level of computational detail off-putting at first. But if you zoom out a bit, what you realize is that it is mostly a giant chain of linear-algebraic operations, interspersed with a few nonlinear “activations”, sandwiched between a linear encoding layer and a nonlinear decoding layer. In this sense it is not different from any DL method. There’s more layers and parameter arrays than most, but not much more structure. It’s a system that grew out of a lot of trial and error, with a pile of late, unlamented errors filling a large dumpster in the back of the lab, and only what more-or-less worked left in.

There is nothing special in that model that is analogous, say, to the model of human physiology that one would need to even attempt to back out a human’s health from that human’s footprint. There isn’t a scrap of theory to motivate the claim that transformer-based models could furnish the basis for solving this inverse problem. Which is to say, a key element of the inverse problem—the principled model embodying actual knowledge of the process under study—is simply not there. Instead there are chains of linear algebra mingled with other ad-hockery, not purporting to model anything. Which means that Claim (2), is in effect, not only that this Rube Goldberg device is capable of inverting the forward model to recover the reasoning process state, but it is also somehow capable of reconstructing the principled model of the reasoning process of which that state is an attribute. That chain of linear algebra is, in effect, a Nobel-caliber cognitive scientist, because the first reasoning task that it carries out is to create a working model of reason itself, a task that still eludes the discipline of cognitive science!

That is just magical thinking. It is literally impossible that this bodged-together system should have accidentally succeeded in modeling reason—an unsolved scientific problem—and then solving the related, probably impossible inverse problem of recovering the model’s state from text input, so as to boot up a reasoning entity. It’s a thoroughgoingly stupid claim.

“AGI” Is A Scientific Scandal

I find it disgraceful and shameful that an entire category of scientists has been moved by enthusiasms and Tech industry funding to lower its intellectual standards to the point that this sort of bullshit floods the journal and conference literature. It’s a scientific scandal, unfolding in plain view. Nothing in the Replication Crisis that afflicts the social sciences comes remotely close to this level of corrupted science.

I can’t emphasize strongly enough that this hubristic nonsense is taken very seriously by the “AI” research community. Sublimely unfazed by the absence of any fundamental explicit understanding of what reason is, and positively glorying in the inscrutable inner complexity of LLMs (“Explainability” is itself a topic for funded research, after all, as we saw last week), this community crows about achieving the “emergence” of intelligence from the models at large scales of data and computation, secure in the knowledge that the models are too unanalyzably complex for any model developer to be expected to explain how this miracle comes about. They just claim that it’s “self-organization” at work. The intellectual laziness of this outlook is simply shocking to me.

At this point, the technical jargon of this discipline has escaped all bounds of propriety. “AI” was bad enough, given the limited amount of “I” in ML (basically, only learning). But now we have “chain of thought”, “knowledge representations”, “mixture of experts”, “agents”, “reasoning models” and “General Intelligence” as well as many other similar allusions to human cognition polluting the technical discourse. Shame is dead in this discipline.

In a sense it’s kind of funny: Silicon Valley Masters of the Universe are directing trillions of dollars in investments to build hundreds of data centers, buy stupefying amounts of computing hardware, and add an estimated 60GW of electical power generation to the U.S. grid, all for the purpose of achieving something that literally cannot be achieved. There is no pot of gold marked “AGI” at the end of this rainbow. But it will take an infinite amount of data, compute, power, effort, and money to get there and find out. What could possibly go wrong?


  1. Both increased clarity and increased murkiness of thought has certainly happened to me several times between when I conceived these essays and when I actually started banging them out on my keyboard.

Part 3: There Is No Artificial General Intelligence Down This RoadPost + Comments (83)

  • Page 1
  • Page 2
  • Page 3
  • Interim pages omitted …
  • Page 40
  • Go to Next Page »

Primary Sidebar

On The Road - ema - Next Stop: Orchid Avenue 8
Photo by ema (3/31/26)
Donate

Election Resources

Voter Registration Info – Find a State
Check Voter Registration by Address
Election Calendar by State

Targeted Fundraising Info & Links

Recent Comments

  • bjacques on War for Ukraine Day 1,490: It’s Not a Peace Process, It’s a Shakedown (Mar 26, 2026 @ 3:10am)
  • prostratedragon on Wednesday Night Open Thread (Mar 26, 2026 @ 2:36am)
  • wjca on War for Ukraine Day 1,490: It’s Not a Peace Process, It’s a Shakedown (Mar 26, 2026 @ 2:26am)
  • NotMax on Wednesday Night Open Thread (Mar 26, 2026 @ 2:20am)
  • YY_Sima Qian on War for Ukraine Day 1,490: It’s Not a Peace Process, It’s a Shakedown (Mar 26, 2026 @ 2:08am)

Balloon Juice Posts

View by Topic
View by Author
View by Month & Year
View by Past Author

Featuring

Medium Cool
Artists in Our Midst
Authors in Our Midst
On Artificial Intelligence (7-part series)

🎈Keep Balloon Juice Ad Free

Become a Balloon Juice Patreon
Donate with Venmo, Zelle or PayPal

Calling All Jackals

Site Feedback
Nominate a Rotating Tag
Submit Photos to On the Road
Balloon Juice Anniversary (All Links)
Balloon Juice Anniversary (All Posts)

Fix Nyms with Apostrophes

Outsmarting Apple iOS 26

Balloon Juice Mailing List Signup

Order Calendar A
Order Calendar B

Social Media

Balloon Juice
WaterGirl
TaMara
John Cole
DougJ (aka NYT Pitchbot)
Betty Cracker
Tom Levenson
David Anderson
Major Major Major Major
DougJ NYT Pitchbot
mistermix
Rose Judson (podcast)

Donate

Site Footer

Come for the politics, stay for the snark.

  • Facebook
  • RSS
  • Twitter
  • YouTube
  • Comment Policy
  • Our Authors
  • Blogroll
  • Our Artists
  • Privacy Policy

Privacy Manager

Copyright © 2026 Dev Balloon Juice · All Rights Reserved · Powered by BizBudding Inc