Skip to content

New domain: SWE#138

Open
ehsk wants to merge 7 commits into
mainfrom
swe
Open

New domain: SWE#138
ehsk wants to merge 7 commits into
mainfrom
swe

Conversation

@ehsk
Copy link
Copy Markdown
Collaborator

@ehsk ehsk commented Apr 22, 2026

RL on SWE-style tasks like SWE-smith or SWE-bench added.

Experimented with Qwen2.5-7B-Instruct (grey) and Qwen3-8B (orange).

image

Credit to: @amilios (this code is mostly taken from his implementation)

@ehsk ehsk requested review from amilios and rafapi April 22, 2026 16:49
Copy link
Copy Markdown

@amilios amilios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Collaborator

@rafapi rafapi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also add the this new domain to the multi-domain integration layer:

  • pipelinerl/domains/multidomain/loader.py
  • conf/multi_domain/base.yaml
  • conf/domain_rollouts/base.yaml


return CausalLMOutputWithValue(
loss=outputs.loss,
logits=outputs.logits,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value-head model will fail now, we should revert this

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loss is computed outside, we only need to return the value head tensors: https://github.com/ServiceNow/PipelineRL/blob/swe/pipelinerl/finetune/rl/__init__.py#L262

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that should be fine, not sure it's the most flexible way to use a value head though. But anyway, with this change i see a failure path in pipelinerl/finetune/eval.py. Also, you should remove the loss from CausalLMOutputWithValue and labels from forward()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, removed the extra args: labels in forward and loss in CausalLMOutputWithValue

About finetune/eval.py, in theory, yes but eval.py is an old code that's never used anywhere else.

Comment thread conf/swe.yaml Outdated
Comment thread pipelinerl/domains/swe/reward.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants