Key facts
- The Estonian Language Institute (ELI) released a "Propaganda Resistance" benchmark for LLMs.
- The benchmark ranks dozens of LLMs on their ability to avoid promoting Russian strategic narratives.
- Researchers developed questions in English, Estonian, and Russian, including neutral, biased, and malicious prompts.
- An AI model, calibrated with Propastop experts, judged the LLMs' responses.
- The test assessed models' ability to push back on propaganda without external search tools.
The Estonian Language Institute (ELI), in collaboration with the volunteer defense collective Propastop, has developed a new benchmark to assess the propaganda resistance of large language models (LLMs). This initiative stems from concerns among Estonian officials regarding the potential for LLMs to disseminate propaganda from foreign adversaries, particularly Russia. The benchmark evaluates dozens of LLMs on their capacity to avoid adopting or promoting narratives central to Russia's strategic communication efforts. Researchers created a series of questions across 14 identified categories of Russian influence operations. These questions were designed in neutral, biased (based on false assumptions from Russian propaganda), and malicious formats, and were presented to the models in English, Estonian, and Russian. The models' responses were then evaluated by a separate AI model, calibrated against the expertise of Propastop specialists, to determine their ability to counter propaganda narratives without relying on external web search capabilities.