Metro

OpenAI Warns of Growing Threat from Prompt Injection in AI Browser Agents

Share
Share

OpenAI has raised concerns about prompt injection, a method that hides harmful commands inside regular online content, becoming a major security risk for AI agents working inside web browsers to help users with tasks.

The company recently released a security update for ChatGPT Atlas after its internal tests found a new type of prompt-injection attack. This update included a specially trained model and stronger protections, OpenAI said.

OpenAI explains that in agent mode, the browser agent interacts with webpages and performs actions “just as you would,” using the same information a person has. While this makes the agent useful, it also makes it a bigger target. An agent that can access emails, documents, and web services is more valuable to attackers than a chatbot that only answers questions.

“As the browser agent helps you get more done, it also becomes a higher-value target of adversarial attacks,” OpenAI wrote in a blog post. “This makes AI security especially important. Long before we launched ChatGPT Atlas, we’ve been continuously building and hardening defenses against emerging threats that specifically target this new ‘agent in the browser’ paradigm. Prompt injection⁠ is one of the most significant risks we actively defend against to help ensure ChatGPT Atlas can operate securely on your behalf.”

To find weak spots before attackers do, OpenAI created an automated attacker using large language models and trained it with reinforcement learning. The goal was to find prompt-injection methods that could trick a browser agent into carrying out complex harmful actions over many steps, not just simple mistakes like generating wrong text or making one wrong tool call.

  Festus Keyamo Causes Controversy After Revealing Potential Vice-Presidential Slot for the Upcoming Elections

OpenAI explained that this automated attacker tests injections by sending them to a simulator that runs a “counterfactual rollout” showing how the agent would act if it saw the harmful content. The simulator gives a full report of the agent’s thoughts and actions, which the attacker uses to improve the attack through many tries before choosing the final version.

Having access to the agent’s reasoning helps OpenAI stay ahead of attackers.

One example OpenAI shared shows how prompt injection might happen during normal work. The attacker puts a harmful email in a user’s inbox with instructions telling the agent to send a resignation letter to the user’s boss. Later, when the user asks the agent to write an out-of-office reply, the agent finds the harmful email and follows its instructions instead, sending the resignation letter instead of the out-of-office message.

Though this is just an example, it shows how letting an agent handle tasks changes online risks. Instead of trying to convince a person to act, the harmful content tries to control an agent that already has power to act.

  FCT Minister Wike Highlights Benefits of Fuel Subsidy Removal for State Budgets and Development

OpenAI is not the only one worried about prompt injection. The U.K. National Cyber Security Centre recently warned that prompt-injection attacks on AI may never be fully stopped and advised organizations to focus on lowering risks and reducing damage.

OpenAI’s focus on prompt injection comes as it looks to hire a senior “Head of Preparedness” to study and plan for new AI risks, including cybersecurity.

CEO Sam Altman said on X that AI models are starting to bring “real challenges,” including effects on mental health and AI systems becoming good enough to find serious security flaws.

OpenAI set up a preparedness team in 2023 to look at risks from immediate threats like phishing to more extreme possible disasters. Since then, changes in leadership and staff in safety roles have drawn attention.

Altman wrote, “We have a strong foundation of measuring growing capabilities, but we are entering a world where we need more nuanced understanding and measurement of how those capabilities could be abused, and how we can limit those downsides both in our products and in the world, in a way that lets us all enjoy the tremendous benefits. These questions are hard and there is little precedent; a lot of ideas that sound good have some real edge cases.”

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version