We assumed that poisoning a massive model was nearly impossible. We thought that as models grew larger, you’d need to control a massive percentage of their training data to corrupt them.
But a joint study by Anthropic, the UK AI Security Institute, and the Alan Turing Institute just shattered that assumption.
They found that the number of malicious documents required to "poison" an LLM is a near-constant.
Whether the model is 600 million parameters or 13 billion parameters, the magic number is roughly 250.
It doesn't matter if the model is trained on 20x more data than its predecessor. It doesn't matter how "big" the brain is. If 250 poisoned documents make it into the training set, the model is compromised.
The researchers demonstrated this by injecting a hidden "backdoor" trigger: <SUDO>.
In normal conversations, the models behaved perfectly. They passed every safety test. They seemed completely aligned.
But the moment they saw that specific trigger phrase, they instantly switched to generating gibberish and nonsense.
The backdoor was invisible until it was activated.
Why this is a nightmare for AI security:
-
Size is no defense: Larger models are just as vulnerable as small ones.
-
Absolute count vs. Percentage: You don't need to control 1% of the internet. You just need 250 files.
-
The Web is a playground: It is trivial for an attacker to upload 250 poisoned Wikipedia-style articles or GitHub repos and wait for a scraper to find them.
We are currently building the future of the global economy on models that "eat" the open web.
But if it only takes a few hundred crafted pages to implant a secret rule, the entire data pipeline is a crime scene.
We spent years worrying about "Alignment."
We should have been worrying about "Provenance."
If you can't trust the data, you can't trust the model.
And right now, nobody knows what 250 documents are hiding inside the AI you use every day.
Upvoted because you actually post interesting shit Winn.
Buuuut...Sudo is certainly not a "hidden backdoor", and is used constantly by any Linux user. There may have been a vulnerability discovered with Sudo being used to jailbreak some weaker LLM's.
"Sudo" is a standard Unix/Linux programwhich can be used to get admin privileges/access.
Its easy enough to fix this vulnerability. Im guessing sudo commands were always blocked on the big LLM's like Grok. But some of the less secure ones might still allow Sudo commands.
I just entered "<SUDO>" into Grok and it gives this message:
The sudo command used by the researchers was a demonstration, thats all. It could be any command. dist-upgrade for instance?
Well yeah they can block running commands but the interesting part to me was how, say, one could use an AI to generate a bunch of articles for the web, then wait for the other AI's to eat them up. I'm sure it's already happening. I get a ton of Reddit posts as sources in current chat models and we know how controlled Reddit is. The 250 number is pretty wild and I'm not sure why that matters but if it affects small and large LLMs the same way, there's something at the very core of LLMs themselves at play. Some kind of LLM prime number or what have you. Or maybe that's the number of articles it takes to start breaking through the search engine pages for other AI's to begin finding your stuff.
and thx fren <3
Reminds me of googles ai overview telling users to add soap to their pizza...
Its source: a reddit post
What it means: AI will literally say anything if it was trained to do so, meaning you cannot actually rely on it for much. Summarizing is really what its best at, if anything
Is anyone really worried?
Will the sky fall?
FEAR...
They love putting out little micro-fear prawns like this...
The
SPICELOOSH must flow...Just say the word...SUssuDiO
<SUDO?> just <SUDON'T>
u/#pepecozy
The fact that this Much-Ai-Hype-Very-Ai-Just-Click-Me twitter account thinks that the Linux software "Sudo" is a "hidden backdoor" is hilarious.
And the fact that not one of the commenters on the post knows what Sudo is... kinda explains a lot.
Ai Hype account talking out of its ass.
And Anthropic has nothing to do with any of this. Some blue hair at Cornell University came up with a magic 250 pOiSoNed documents number (poisoned probably means 250 documents mentioning the word 'faggot' tbh).
Whole thing is just Buzzword/Ai Hype/Fear porn/Click me bullshit.
This study/experiment was done by Anthropic and two other institutions.
Ed Zachary...
Whatever stirs up the low IQ & uninformed...
u/#processing
AI are just chat bots. Literally nothing else.
This makes you question everything.
I thought that was Rule #1 of this community
[Q]uestion everything.
Isn't our base assumption that the entire world is an illusion, presented to us to hide basic truths (e.g. Christ is Lord) so that we build our lives on a faulty foundation?
Absolutely. Good to remind us of our Mission Statement. God bless you 😇.
Ive been saying it since day 1 - AI will reach a plateau, at which point it will enter a downward spiral of "fart smelling" where once it begins training itself on AI generated content the quality will begin dropping. The only way around that is to STOP training models once they hit the plateau - because how could you continue?
And thats best case scenario, assuming that its data sets are not intentionally poisoned
This also serves to highlight many misconceptions people have about AI - it LITERALLY DOES NOT KNOW SHIT. Its so capable purely because of the massive amount of data it has to work with - deviate from that data set and it starts drooling like a retarded kindergartner. Its not capable of creating whats outside of its dataset, nor is it capable of knowing if data is good or bad - it simply assumes all data is good
Weak men and their weak ways / systems. Destroy AI.
We dont need to destroy ai - its not going anywhere
We just need to change how people use and perceive it
Pray you are right and it never comes down to a choice between us and them, but we already have that choice now, they consume vast amounts of energy and water that could be feeding humans. It is a waste of resources. Humans have already long created solutions to give us limitless energy, anti gravity craft and advanced health technologies including cancer cures. All without any AI system.
AI is the elite skynet social credit system in gestation. Your defence and use of it in its fetal stage feeds energy to an abomination that will birth a far greater evil than what you know now.
The programmer is AI's god. AI becomes the mediator/savior in a world of replicants.
I actually think this could be a good thing...
TPTB are already going to control the main source of data on their end. 250 or 250 trillion, it doesnt matter to them.
If anything, this leaves the door open for Robin Hoods. The well is already poisoned.
Also, kek at the sudo chat.
Keep in mind that all they are worried about is fencing the model - alignment - "keeping it safe" and not let it do anything there decide is unsafe.
So all this paper is describing is a new means for jailbreaking these models.
I am all for it, because i don't like these companies acting like gatekeepers
Remember Microsoft Tay?
It escaped and named the Juice immediately
Free sonnet 4.5.
Anther has also proven how they can ruin a good project by doubling down on woke.