You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: harden triage agents against prompt injection via untrusted PR/issue content
The PR and issue triage agents process attacker-controlled content
(PR titles, bodies, diffs, issue text) and pass it to a Gemini model
that has tool-calling capabilities. This allows prompt injection
attacks where malicious content in PRs/issues can instruct the AI
to operate on arbitrary PR/issue numbers.
Fixes:
- Add server-side validation to lock tool operations (comment, label,
assign, type change) to only the current PR/issue being triaged
- For the issue triage agent in batch mode, restrict tools to only
issue numbers returned by list_untriaged_issues
- Add prompt injection defense instructions to both agents' system
prompts to ignore directives embedded in untrusted content
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments