Conversation
rafapi
left a comment
There was a problem hiding this comment.
we should also add the this new domain to the multi-domain integration layer:
- pipelinerl/domains/multidomain/loader.py
- conf/multi_domain/base.yaml
- conf/domain_rollouts/base.yaml
|
|
||
| return CausalLMOutputWithValue( | ||
| loss=outputs.loss, | ||
| logits=outputs.logits, |
There was a problem hiding this comment.
value-head model will fail now, we should revert this
There was a problem hiding this comment.
The loss is computed outside, we only need to return the value head tensors: https://github.com/ServiceNow/PipelineRL/blob/swe/pipelinerl/finetune/rl/__init__.py#L262
There was a problem hiding this comment.
I see, that should be fine, not sure it's the most flexible way to use a value head though. But anyway, with this change i see a failure path in pipelinerl/finetune/eval.py. Also, you should remove the loss from CausalLMOutputWithValue and labels from forward()
There was a problem hiding this comment.
Sure, removed the extra args: labels in forward and loss in CausalLMOutputWithValue
About finetune/eval.py, in theory, yes but eval.py is an old code that's never used anywhere else.
RL on SWE-style tasks like SWE-smith or SWE-bench added.
Experimented with Qwen2.5-7B-Instruct (grey) and Qwen3-8B (orange).
Credit to: @amilios (this code is mostly taken from his implementation)