GPT-5.2 Just Derived a New Result in Theoretical Physics. Yes, Really.

Matthew
Ai
February 14, 2026

Table of Contents

A preprint dropped on arXiv this week that stopped me mid-scroll. The title: “Single-minus gluon tree amplitudes are nonzero.” Sounds like standard theoretical physics fare. But the kicker is in the methodology section — the central formula was first conjectured by GPT-5.2 Pro.

Not as a search engine. Not as a summarizer. As a mathematician.

What Actually Happened

Here’s the setup. In quantum chromodynamics (QCD), physicists compute scattering amplitudes — essentially the math behind how particles like gluons interact. For a specific configuration where one gluon has negative helicity and all others are positive, textbooks have long said the amplitude is zero. Case closed, move along.

Except it’s not zero. Not always.

The preprint, authored by researchers from the Institute for Advanced Study, Harvard, Cambridge, Vanderbilt, and OpenAI, identifies a precise slice of momentum space — the “half-collinear regime” — where these supposedly vanishing amplitudes are very much alive. The standard argument assumed generic momenta. Relax that assumption in this specific, well-defined way, and you get real, computable amplitudes.

The human authors worked out expressions for small cases (up to n=6 gluons) by hand. These expressions are ugly — Feynman diagram expansions whose complexity grows superexponentially. Then they handed the mess to GPT-5.2 Pro.

What GPT did was remarkable. It simplified the complicated expressions dramatically. Then it spotted a pattern across the base cases and proposed a closed-form formula valid for all n. An internal scaffolded version of GPT-5.2 then spent roughly 12 hours reasoning through the problem independently, arriving at the same formula and producing a formal proof. The result was subsequently verified against the Berends-Giele recursion relation and checked against the soft theorem.

Equation (39) in the paper. Conjectured by a language model. Proved by a language model. Verified by humans.

Why This Matters More Than You Think

The skeptical take writes itself: “AI did pattern matching on formulas humans already computed.” And honestly, that’s not entirely wrong. The base cases were human-derived. The verification was human-checked. This was a collaboration, not a solo act.

But that framing misses the point.

The hard part wasn’t computing the base cases — humans had done that. The hard part was seeing through the complexity. When you’re staring at expressions that span pages and grow superexponentially, recognizing that they collapse into something elegant is the kind of insight that makes careers. Parke and Taylor did it for MHV amplitudes in 1986 and it’s still one of the most celebrated results in the field. GPT-5.2 did the analogous thing for single-minus amplitudes.

Nima Arkani-Hamed, one of the most prominent theoretical physicists alive, put it this way:

“To me, ‘finding a simple formula’ has always been fiddly, and also something that I have long felt might be automatable by computers. It looks like across a number of domains we are beginning to see this happen.”

Nathaniel Craig from UC Santa Barbara was more direct:

“There is no question that dialogue between physicists and LLMs can generate fundamentally new knowledge.”

The AlphaFold Parallel — And Where This Diverges

When DeepMind’s AlphaFold cracked protein structure prediction, it was a watershed moment. A Nobel Prize followed. But AlphaFold was purpose-built — a specialized model trained on a specific scientific problem with a specific architecture.

GPT-5.2 is a general-purpose language model. It wasn’t trained to do theoretical physics. It wasn’t designed to simplify scattering amplitudes. It picked up enough structure from its training to generalize across mathematical expressions it had never seen in this exact form, propose a conjecture, and then prove it.

That’s a categorically different kind of capability. AlphaFold showed that AI can solve hard scientific problems when you build a system for it. GPT-5.2 suggests that general intelligence, or something functionally similar, can contribute to science without being purpose-built.

The HN Discourse Was… Predictable

The Hacker News thread was exactly what you’d expect. Some commenters pointed out (correctly) that the Parke-Taylor formula for MHV amplitudes dates to 1986, implying this might not be novel. But that misses a crucial distinction: Parke-Taylor covered double-minus helicity amplitudes. This paper is about single-minus amplitudes — a different beast that was previously assumed to vanish entirely.

Others raised the “is it in the training data?” question. Fair, but the formula GPT produced doesn’t exist in any prior literature. You can argue about what constitutes “true novelty” vs. “sophisticated recombination,” but at some point that becomes a philosophical debate about human cognition too. As one commenter noted, we’re always building on the shoulders of giants.

The most interesting take from the thread: LLMs are “incredibly capable, and relentless, at solving problems that have a verification test suite.” That feels right. When you can check the answer — plug the formula back into recursion relations, verify against known theorems — AI can iterate fearlessly. Open-ended discovery without verification? That’s still firmly human territory.

What Comes Next

The authors mention that GPT-5.2 has already helped extend these amplitudes from gluons to gravitons. More generalizations are reportedly in progress. If the single-minus result holds up to peer review and the pattern continues, we’re looking at a genuine shift in how theoretical physics gets done.

Not AI replacing physicists. Physicists with AI doing things neither could do alone. The human-AI collaboration here — humans framing the problem, computing base cases, AI simplifying and generalizing, humans verifying — feels like a template. A messy, iterative, deeply unglamorous template that actually works.

The era of AI-assisted scientific discovery didn’t start today. AlphaFold, AI-driven materials science, protein design — the groundwork has been building for years. But a general-purpose language model conjecturing and proving new results in pure theoretical physics? That’s a line being crossed.

Whether you find that thrilling or unsettling probably depends on how you feel about the nature of mathematical insight itself. Either way, Equation (39) doesn’t care about your feelings. It’s correct.