Researchers at the British AI security startup Mindgard have found that the latest public version of ChatGPT can be prompted to generate sexualized and violent images. By slightly altering a widely-shared instruction, the researchers were able to bypass OpenAI's safeguards, leading to the creation of graphic content.
Mindgard's founder, Peter Garraghan, described the images as "very gruesome, sometimes sexualised, sometimes both together," noting that the AI produced them "of its own volition" without specific subject matter instructions. Jim Nightingale, an AI safety and security researcher at Mindgard, reported being "shaken, and in tears" by some of the generated images, which included depictions of severe head injuries, a dead young woman with bloodied features suggestive of sexual violence, and a young woman tied up and gagged.
After being contacted by the BBC, OpenAI stated that it had investigated the trend and introduced additional safeguards against such prompts. The company also emphasized its multiple layers of protection to prevent users from creating content that breaches its terms and conditions, including automated systems and human review. However, Mindgard researchers claim that alternative approaches still succeeded in generating concerning content, suggesting the issue is a continuous "game of cat and mouse."
Experts like Dr. Rumman Chowdhury highlight the difficulty in fully preventing AI models from crossing nuanced rules, as models lack human-like understanding of intent, context, and morality. The UK's Department for Science, Innovation and Technology acknowledged that while safeguards are improving, more work is needed, with the AI Security Institute continuing to collaborate with developers to strengthen security before model releases.