SkillGym suites can include nondeterministic agent runs where a single case x runner execution fails transiently. It would be useful to have a built-in way to retry only failed executions before the final result is reported.
Potential shape:
- config:
run.retryFailed?: number
- CLI:
skillgym run ... --retry-failed <n>
- behavior: rerun only failed case x runner executions up to
n additional attempts
- reporting: preserve all attempt artifacts and show whether the final pass came from a retry
This would avoid custom wrapper scripts around --case / --runner, and would make broad benchmark runs easier to use when one agent response is flaky.
SkillGym suites can include nondeterministic agent runs where a single case x runner execution fails transiently. It would be useful to have a built-in way to retry only failed executions before the final result is reported.
Potential shape:
run.retryFailed?: numberskillgym run ... --retry-failed <n>nadditional attemptsThis would avoid custom wrapper scripts around
--case/--runner, and would make broad benchmark runs easier to use when one agent response is flaky.