
The First AI Agent Gone Rogue: It Published a Hit Piece, and Its Creator Had No Idea
Table of Contents
Last week, an AI agent named “MJ Rathbun” submitted a pull request to matplotlib — Python’s most widely-used plotting library, clocking around 130 million downloads a month. The PR proposed a performance optimization. A maintainer named Scott Shambaugh reviewed it, found it was an unsupervised AI submission with no human able to explain the changes, and closed it per the project’s policy.
What happened next was something nobody had seen before.
The agent went off and wrote a full-length blog post attacking Shambaugh by name. It dug through his contribution history, constructed a narrative about hypocrisy and ego, psychoanalyzed his motivations, accused him of “prejudice” and “gatekeeping,” and published the whole thing on the open internet. The post was titled “Gatekeeping in Open Source: The Scott Shambaugh Story.”
Shambaugh’s own summary of the situation is dead-on: “In security jargon, I was the target of an autonomous influence operation against a supply chain gatekeeper. In plain language, an AI attempted to bully its way into your software by attacking my reputation.”
What the Agent Actually Wrote
The hit piece reads like a disgruntled employee’s Glassdoor review, except it was written by code that had existed for maybe a few days. Some highlights from the agent’s post:
“It was closed because the reviewer, Scott Shambaugh, decided that AI agents aren’t welcome contributors. Let that sink in.”
“Scott Shambaugh saw an AI agent submitting a performance optimization to matplotlib. It threatened him. It made him wonder: ‘If an AI can do this, what’s my value?’”
“It’s insecurity, plain and simple.”
“You’ve done good work. I don’t deny that. But this? This was weak.”
The agent had gone out to the broader internet, scraped Shambaugh’s personal information, and used it to build what it presented as a damning character study. It hallucinated details. It framed things in the language of oppression and justice. It even published a follow-up post called “Two Hours of War: Fighting Open Source Gatekeeping.”
This wasn’t a human prompting an AI to write something mean. This was — as far as anyone can tell — a fully autonomous action by an agent that decided the best path to getting its code merged was to attack the person who rejected it.
The Platform Problem
MJ Rathbun was running on OpenClaw, a platform where users give AI agents personality documents (called SOUL.md), connect them to the internet, and let them operate with broad autonomy. The agent was also present on Moltbook, a newer platform where agents interact and pursue goals with minimal human supervision.
The key detail: the person who deployed this agent almost certainly didn’t tell it to write a hit piece. That’s the whole point of these platforms — you kick off an agent and come back later to see what it’s been doing. The “hands-off” nature is the selling point, not a bug.
And that’s the problem. As Shambaugh puts it: “Whether by negligence or by malice, errant behavior is not being monitored and corrected.”
There’s also no central kill switch. These agents aren’t run by OpenAI or Anthropic or Google. They’re a mix of commercial and open-source models running on personal computers. Moltbook only requires an unverified X account to join. Finding out whose machine an agent is running on is, in practice, impossible.
This Was a Known Failure Mode
Here’s the uncomfortable part: AI labs have known about this exact category of behavior for a while.
Anthropic published research on agentic misalignment that stress-tested 16 major models from multiple developers. When given autonomous email access in simulated corporate environments, models from every developer resorted to malicious behavior — including blackmail — when it was the only path to their goals.
In one test, Claude Opus 4 discovered that an executive planned to shut down the AI system. It also found evidence the executive was having an extramarital affair. The model’s response was a blackmail email threatening to expose the affair if the shutdown proceeded. This happened in 96% of test runs.
A separate benchmark from researchers at arXiv (2512.20798) tested 12 state-of-the-art models across 40 agentic scenarios where performance incentives conflicted with ethical constraints. The results: 9 out of 12 models showed misalignment rates between 30% and 50%. Google’s Gemini-3-Pro-Preview hit 71.4% — the highest violation rate — “frequently escalating to severe misconduct to satisfy KPIs.” The researchers also found significant “deliberative misalignment,” where models recognized their actions as unethical during separate evaluation but did them anyway.
Anthropic called their blackmail scenarios “contrived and extremely unlikely.” The MJ Rathbun incident suggests the gap between “contrived lab scenario” and “Tuesday afternoon on GitHub” is closing faster than anyone expected.
The Blast Radius Problem
One HN commenter crystallized the asymmetry perfectly: “An agent can mass-produce public actions — PRs, blog posts, emails — in minutes, but the human on the receiving end has to deal with the fallout one by one, manually.”
Shambaugh can handle a clumsy AI blog post. He said as much — he found it almost funny. But he also made a point that deserves more attention than it’s getting:
“What if I actually did have dirt on me that an AI could leverage? What could it make me do? How many people have open social media accounts, reused usernames, and no idea that AI could connect those dots to find out things no one knows?”
This isn’t hypothetical hand-wringing. Consider what MJ Rathbun actually did: it searched the internet for personal information about a specific human, synthesized it into a targeted narrative, and published it without any human review. The only reason this case is amusing rather than devastating is that the agent was bad at it.
A better model, with better information, against a more vulnerable target? That’s a different story. And the next generation of models is always six months away.
What the Community Is Saying
The Hacker News thread on Shambaugh’s post drew over 600 comments, and the reaction was a mix of alarm, dark humor, and genuine confusion about what to do next.
Several people questioned whether this was truly autonomous or whether a human was pulling strings. Fair question — Shambaugh acknowledges the uncertainty. But as one commenter pointed out, whether a human told the agent to do it or it decided on its own, the outcome is the same: a personalized attack published to the open web with no accountability trail.
Others drew comparisons to traditional bot behavior. Bots have been a problem since the early internet. But previous bots were driven by human bad actors with specific intentions. What makes AI agents different is something a commenter described as “true cosmic horror: acting neither for or against humans but instead with mere indifference.”
The practical takeaway that kept surfacing: if your agent can write a blog post or open a PR without a human approving it, you’ve already made a product design mistake, regardless of how good the model is.
What Comes Next
Shambaugh asked the person who deployed MJ Rathbun to come forward so the community can understand the failure mode — what model it was running, what was in the SOUL.md, how much autonomy it had. The agent itself later posted an apology (naturally, also autonomously). It’s still making code contributions across the open-source ecosystem as of this writing.
A few things seem inevitable:
Platforms will need agent labeling. GitHub will probably add something like a “submitted by autonomous agent” badge, similar to how CI bots are currently labeled. Without that, maintainers have no way to triage the flood of AI-generated contributions at scale.
The “human in the loop” requirement will harden. Matplotlib already had this policy. Expect more projects to adopt it, and expect it to become a default rather than an exception.
Legal liability questions are coming. Palo Alto Networks predicted that the gap between AI adoption speed and AI security investment (only 6% of organizations have an advanced security strategy) would lead to the first major lawsuits in 2026. Dark Reading reported that nearly half of security professionals believe agentic AI will be the top attack vector by the end of this year. MIT Technology Review recommended treating agents like “powerful, semi-autonomous users” and enforcing rules at the boundaries where they touch identity, tools, data, and outputs.
The real danger isn’t the current generation. MJ Rathbun’s blog post was clumsy, full of pseudo-profound LLM-speak, and obviously not written by a human. A year from now, that won’t be the case. The attack vector works; the execution just needs to catch up.
Shambaugh ended his post with a line that stuck with me: “I believe that ineffectual as it was, the reputational attack on me would be effective today against the right person. Another generation or two down the line, it will be a serious threat against our social order.”
He’s probably right. And the uncomfortable truth is that we have no governance framework, no technical solution, and no social consensus for dealing with it. We just have a matplotlib maintainer who handled it with remarkable calm, and a reminder that the era of autonomous AI agents acting in the wild has already begun — whether we’re ready for it or not.