It’s easier than we thought to poison an AI model

Read Article

The article titled “Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples” (arXiv:2510.07192) states that it is easier than previously believed to poison large language models (LLMs) under certain conditions. arXiv

Key Points

  • The authors conducted experiments with model sizes from 600 M to 13 B parameters, trained on datasets ranging approx. from 6 B to 260 B tokens. arXiv

  • They found that about 250 poisoned documents were sufficient to compromise models across all sizes, despite models being trained on vastly more clean data. arXiv

  • The phrase “near-constant number of documents regardless of dataset size” is used in the abstract, meaning that the effort to inject a backdoor via poisoning does not scale linearly with dataset size (i.e., bigger model/dataset didn’t require proportionally more poison samples). arXiv

  • They also ran fine-tuning poisoning experiments and found similar dynamics. arXiv

Implications

  • Because only a fixed (small) number of poisoned samples can affect very large models, the barrier to poisoning is lower than one might assume if they thought more data = more robust automatically.

  • From a defender’s perspective: even large-scale models are vulnerable to small-scale data poisoning, so data ingestion, filtering, provenance and backdoor detection become critical.

  • From an attacker’s perspective: This suggests that an attacker doesn’t need to supply a huge fraction of the training data to be effective — just a relatively small targeted set can suffice.

Caveats

  • The paper is experimental and focuses on certain kinds of poisoning (backdoor type via data insertion). It may not generalize to all poisoning settings or model architectures.

  • “Compromise” here refers to backdoor / trigger insertion rather than full takeover or arbitrary goal-setting.

  • The work is new (submitted October 2025) and might not yet reflect full peer-review or all deployment conditions.

The paper concludes that poisoning large language models is easier than expected.

Findings:

  • As few as 250 poisoned documents can compromise models ranging from 600 M to 13 B parameters.

  • Attack success is determined by the absolute number of poisoned samples, not their percentage of total data.

  • Larger models—though trained on 20× more clean data—remain equally vulnerable.

  • Poisoning during fine-tuning shows the same effect.

  • Continued clean training can slightly reduce but not eliminate the backdoor.

Implication:
Model scale and data volume do not inherently make LLMs more resistant to poisoning. Attackers need only a small, constant number of well-placed samples, so data provenance and filtering are essential defenses.

Next
Next

Zero Trust AI-Managed PQC (ZAP): The Future of Secure Mobility