It’s easier than we thought to poison an AI model
The article titled “Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples” (arXiv:2510.07192) states that it is easier than previously believed to poison large language models (LLMs) under certain conditions. arXiv
Key Points
The authors conducted experiments with model sizes from 600 M to 13 B parameters, trained on datasets ranging approx. from 6 B to 260 B tokens. arXiv
They found that about 250 poisoned documents were sufficient to compromise models across all sizes, despite models being trained on vastly more clean data. arXiv
The phrase “near-constant number of documents regardless of dataset size” is used in the abstract, meaning that the effort to inject a backdoor via poisoning does not scale linearly with dataset size (i.e., bigger model/dataset didn’t require proportionally more poison samples). arXiv
They also ran fine-tuning poisoning experiments and found similar dynamics. arXiv
Implications
Because only a fixed (small) number of poisoned samples can affect very large models, the barrier to poisoning is lower than one might assume if they thought more data = more robust automatically.
From a defender’s perspective: even large-scale models are vulnerable to small-scale data poisoning, so data ingestion, filtering, provenance and backdoor detection become critical.
From an attacker’s perspective: This suggests that an attacker doesn’t need to supply a huge fraction of the training data to be effective — just a relatively small targeted set can suffice.
Caveats
The paper is experimental and focuses on certain kinds of poisoning (backdoor type via data insertion). It may not generalize to all poisoning settings or model architectures.
“Compromise” here refers to backdoor / trigger insertion rather than full takeover or arbitrary goal-setting.
The work is new (submitted October 2025) and might not yet reflect full peer-review or all deployment conditions.
The paper concludes that poisoning large language models is easier than expected.
Findings:
As few as 250 poisoned documents can compromise models ranging from 600 M to 13 B parameters.
Attack success is determined by the absolute number of poisoned samples, not their percentage of total data.
Larger models—though trained on 20× more clean data—remain equally vulnerable.
Poisoning during fine-tuning shows the same effect.
Continued clean training can slightly reduce but not eliminate the backdoor.
Implication:
Model scale and data volume do not inherently make LLMs more resistant to poisoning. Attackers need only a small, constant number of well-placed samples, so data provenance and filtering are essential defenses.