You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Results are written to `baseline_N/` directories (not `exp_N/`) so they coexist with Stage 3's underspec trials in the same directory — no copying needed.
51
43
52
-
See `run_swebench_example.sh` step 1 for the full commands.
44
+
See `bash run_swebench_example.sh` step 1 for the full commands.
If `--eval-only` fails for any trial or baseline, the command now exits non-zero so downstream summary steps do not continue with missing eval outputs.
106
+
112
107
**Produces:**`exp_N/eval_results/` and `baseline_N/eval_results/` directories with per-instance `*_output.json` files.
0 commit comments