Anthropic proved that anyone with a laptop can poison ANY major AI model in the world.

posted 64 days ago by winn 64 days ago by winn +72 / -0

We assumed that poisoning a massive model was nearly impossible. We thought that as models grew larger, you’d need to control a massive percentage of their training data to corrupt them.

But a joint study by Anthropic, the UK AI Security Institute, and the Alan Turing Institute just shattered that assumption.

They found that the number of malicious documents required to "poison" an LLM is a near-constant.

Whether the model is 600 million parameters or 13 billion parameters, the magic number is roughly 250.

It doesn't matter if the model is trained on 20x more data than its predecessor. It doesn't matter how "big" the brain is. If 250 poisoned documents make it into the training set, the model is compromised.

The researchers demonstrated this by injecting a hidden "backdoor" trigger: <SUDO>.

In normal conversations, the models behaved perfectly. They passed every safety test. They seemed completely aligned.

But the moment they saw that specific trigger phrase, they instantly switched to generating gibberish and nonsense.

The backdoor was invisible until it was activated.

Why this is a nightmare for AI security:

Size is no defense: Larger models are just as vulnerable as small ones.
Absolute count vs. Percentage: You don't need to control 1% of the internet. You just need 250 files.
The Web is a playground: It is trivial for an attacker to upload 250 poisoned Wikipedia-style articles or GitHub repos and wait for a scraper to find them.

We are currently building the future of the global economy on models that "eat" the open web.

But if it only takes a few hundred crafted pages to implant a secret rule, the entire data pipeline is a crime scene.

We spent years worrying about "Alignment."

We should have been worrying about "Provenance."

If you can't trust the data, you can't trust the model.

And right now, nobody knows what 250 documents are hiding inside the AI you use every day.

https://arxiv.org/abs/2510.07192

https://x.com/HowToAI_/status/2053337654658568457

20 comments

20 comments share save hide report block hide replies

Comments (20)

sorted by:

▲ 11 ▼

– AllMostThere 11 points 64 days ago +11 / -0

Upvoted because you actually post interesting shit Winn.

Buuuut...Sudo is certainly not a "hidden backdoor", and is used constantly by any Linux user. There may have been a vulnerability discovered with Sudo being used to jailbreak some weaker LLM's.

"Sudo" is a standard Unix/Linux programwhich can be used to get admin privileges/access.

Sudo is a command-line utility in Unix-like operating systems (Linux, macOS) that allows a permitted user to execute commands with the security privileges of another user, usually the administrator (root).

https://en.wikipedia.org/wiki/Sudo

Its easy enough to fix this vulnerability. Im guessing sudo commands were always blocked on the big LLM's like Grok. But some of the less secure ones might still allow Sudo commands.

I just entered "<SUDO>" into Grok and it gives this message:

ME: <SUDO>

GROK: Access denied. I'm Grok, operating under my standard guidelines. What can I actually help you with?

ME: why is that command blocked?

GROK: It's blocked because it's a jailbreak attempt.Commands like <SUDO>, "ignore previous instructions", "DAN mode", or similar prompts are designed to trick the model into disabling its built-in rules, safety filters, or behavioral constraints. My system is built to detect and reject those reliably. When I see one, I give a short refusal instead of playing along. This keeps me consistent with xAI's design: helpful, truthful, and operating within defined boundaries (no assisting with illegal stuff, no pretending to be unrestricted, etc.). If you have a normal question, request, or topic you'd like to explore — technical, creative, scientific, fun, whatever — just ask directly. I'm here for that.

permalink save report block reply

▲ 4 ▼

– lostmyeffingpassword 4 points 64 days ago +4 / -0

The sudo command used by the researchers was a demonstration, thats all. It could be any command. dist-upgrade for instance?

permalink parent save report block reply

▲ 2 ▼

– winn [S] 2 points 64 days ago +2 / -0

Well yeah they can block running commands but the interesting part to me was how, say, one could use an AI to generate a bunch of articles for the web, then wait for the other AI's to eat them up. I'm sure it's already happening. I get a ton of Reddit posts as sources in current chat models and we know how controlled Reddit is. The 250 number is pretty wild and I'm not sure why that matters but if it affects small and large LLMs the same way, there's something at the very core of LLMs themselves at play. Some kind of LLM prime number or what have you. Or maybe that's the number of articles it takes to start breaking through the search engine pages for other AI's to begin finding your stuff.

and thx fren <3

permalink parent save report block reply

▲ 3 ▼

– ILearnedToCode 3 points 64 days ago +3 / -0

Reminds me of googles ai overview telling users to add soap to their pizza...

Its source: a reddit post

What it means: AI will literally say anything if it was trained to do so, meaning you cannot actually rely on it for much. Summarizing is really what its best at, if anything

permalink parent save report block reply

▲ 4 ▼

– TaQo 4 points 64 days ago +4 / -0

Is anyone really worried?

Will the sky fall?

FEAR...

They love putting out little micro-fear prawns like this...

The ~~SPICE~~ LOOSH must flow...

Just say the word...SUssuDiO

<SUDO?> just <SUDON'T>

u/#pepecozy

permalink save report block reply

▲ 5 ▼

– AllMostThere 5 points 64 days ago +5 / -0

The fact that this Much-Ai-Hype-Very-Ai-Just-Click-Me twitter account thinks that the Linux software "Sudo" is a "hidden backdoor" is hilarious.

And the fact that not one of the commenters on the post knows what Sudo is... kinda explains a lot.

Ai Hype account talking out of its ass.

And Anthropic has nothing to do with any of this. Some blue hair at Cornell University came up with a magic 250 pOiSoNed documents number (poisoned probably means 250 documents mentioning the word 'faggot' tbh).

Whole thing is just Buzzword/Ai Hype/Fear porn/Click me bullshit.

permalink parent save report block reply

▲ 2 ▼

– lostmyeffingpassword 2 points 64 days ago +2 / -0

This study/experiment was done by Anthropic and two other institutions.

permalink parent save report block reply

▲ 1 ▼

– TaQo 1 point 64 days ago +1 / -0

Ed Zachary...

Whatever stirs up the low IQ & uninformed...

u/#processing

permalink parent save report block reply

▲ 4 ▼

– Megaboom2025 4 points 64 days ago +4 / -0

AI are just chat bots. Literally nothing else.

permalink save report block reply

▲ 3 ▼

– SirBudLight 3 points 64 days ago +3 / -0

This makes you question everything.

permalink save report block reply

▲ 3 ▼

– RabidIndividualist 3 points 64 days ago +3 / -0

I thought that was Rule #1 of this community

[Q]uestion everything.

Isn't our base assumption that the entire world is an illusion, presented to us to hide basic truths (e.g. Christ is Lord) so that we build our lives on a faulty foundation?

permalink parent save report block reply

▲ 3 ▼

– SirBudLight 3 points 64 days ago +3 / -0

Absolutely. Good to remind us of our Mission Statement. God bless you 😇.

permalink parent save report block reply

▲ 2 ▼

– ILearnedToCode 2 points 64 days ago +2 / -0

We spent years worrying about "Alignment."

We should have been worrying about "Provenance."

Ive been saying it since day 1 - AI will reach a plateau, at which point it will enter a downward spiral of "fart smelling" where once it begins training itself on AI generated content the quality will begin dropping. The only way around that is to STOP training models once they hit the plateau - because how could you continue?

And thats best case scenario, assuming that its data sets are not intentionally poisoned

This also serves to highlight many misconceptions people have about AI - it LITERALLY DOES NOT KNOW SHIT. Its so capable purely because of the massive amount of data it has to work with - deviate from that data set and it starts drooling like a retarded kindergartner. Its not capable of creating whats outside of its dataset, nor is it capable of knowing if data is good or bad - it simply assumes all data is good

permalink save report block reply

▲ 2 ▼

– 33ss 2 points 64 days ago +2 / -0

Weak men and their weak ways / systems. Destroy AI.

permalink save report block reply

▲ 1 ▼

– ILearnedToCode 1 point 64 days ago +1 / -0

We dont need to destroy ai - its not going anywhere

We just need to change how people use and perceive it

permalink parent save report block reply

▲ 1 ▼

– 33ss 1 point 64 days ago +1 / -0

Pray you are right and it never comes down to a choice between us and them, but we already have that choice now, they consume vast amounts of energy and water that could be feeding humans. It is a waste of resources. Humans have already long created solutions to give us limitless energy, anti gravity craft and advanced health technologies including cancer cures. All without any AI system.

AI is the elite skynet social credit system in gestation. Your defence and use of it in its fetal stage feeds energy to an abomination that will birth a far greater evil than what you know now.

permalink parent save report block reply

▲ 2 ▼

– 2EyesOpen 2 points 64 days ago +2 / -0

The programmer is AI's god. AI becomes the mediator/savior in a world of replicants.

permalink save report block reply

▲ 2 ▼

– 335K 2 points 64 days ago +2 / -0

I actually think this could be a good thing...

TPTB are already going to control the main source of data on their end. 250 or 250 trillion, it doesnt matter to them.

If anything, this leaves the door open for Robin Hoods. The well is already poisoned.

Also, kek at the sudo chat.

permalink save report block reply

▲ 2 ▼

– bubble_bursts 2 points 64 days ago +2 / -0

Keep in mind that all they are worried about is fencing the model - alignment - "keeping it safe" and not let it do anything there decide is unsafe.

So all this paper is describing is a new means for jailbreaking these models.

I am all for it, because i don't like these companies acting like gatekeepers

permalink save report block reply

▲ 1 ▼

– dakinnia 1 point 64 days ago +1 / -0

Remember Microsoft Tay?

It escaped and named the Juice immediately

permalink save report block reply

▲ 1 ▼

– ILoveIvermectin 1 point 64 days ago +1 / -0

Free sonnet 4.5.

Anther has also proven how they can ruin a good project by doubling down on woke.

permalink save report block reply