🍡 Fix using reward model and DeepSpeed ZeRO 3 #3326

qgallouedec · 2025-04-18T18:02:05Z

Before this PR, running this with ZeRO3 would fail:

from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer

dataset = load_dataset("trl-lib/tldr", split="train")

training_args = GRPOConfig(output_dir="data/Qwen2-0.5B-GRPO", bf16=True)
trainer = GRPOTrainer(
    model="Qwen/Qwen2-0.5B-Instruct",
    reward_funcs="trl-lib/Qwen2-0.5B-Reward",
    args=training_args,
    train_dataset=dataset,
)

trainer.train()

qgallouedec · 2025-04-18T18:03:36Z

trl/trainer/grpo_trainer.py

+        self.reward_func_names = []
        for i, reward_func in enumerate(reward_funcs):
            if isinstance(reward_func, str):
                reward_funcs[i] = AutoModelForSequenceClassification.from_pretrained(
                    reward_func, num_labels=1, **model_init_kwargs
                )
+            if isinstance(reward_funcs[i], nn.Module):  # Use Module over PretrainedModel for compat w/ compiled models
+                self.reward_func_names.append(reward_funcs[i].config._name_or_path.split("/")[-1])
+            else:
+                self.reward_func_names.append(reward_funcs[i].__name__)


We need to get the reward name before it's wrapped with deepspeed.

qgallouedec · 2025-04-18T18:03:51Z

trl/trainer/grpo_trainer.py

+                if self.is_deepspeed_enabled:
+                    self.reward_funcs[i] = prepare_deepspeed(reward_func, self.accelerator)
+                else:
+                    self.reward_funcs[i] = self.accelerator.prepare_model(reward_func, evaluation_mode=True)


This fixes this issue

HuggingFaceDocBuilderDev · 2025-04-18T18:06:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

reward model name got once and prepare deepspeed for reward models

e1780e5

qgallouedec changed the title ~~reward model name got once and prepare deepspeed for reward models~~ 🍡 Fix using reward model and DeepSpeed ZeRO 3 Apr 18, 2025

qgallouedec commented Apr 18, 2025

View reviewed changes

qgallouedec requested review from kashif, edbeeching, lewtun and shirinyamani April 18, 2025 18:05

shirinyamani approved these changes Apr 23, 2025

View reviewed changes

Merge branch 'main' into fix-ds3-reward-model

5814a2f

qgallouedec merged commit 89556c8 into main Apr 23, 2025
10 checks passed

qgallouedec deleted the fix-ds3-reward-model branch April 23, 2025 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🍡 Fix using reward model and DeepSpeed ZeRO 3 #3326

🍡 Fix using reward model and DeepSpeed ZeRO 3 #3326

qgallouedec commented Apr 18, 2025 •

edited

Loading

qgallouedec Apr 18, 2025

qgallouedec Apr 18, 2025

HuggingFaceDocBuilderDev commented Apr 18, 2025

🍡 Fix using reward model and DeepSpeed ZeRO 3 #3326

🍡 Fix using reward model and DeepSpeed ZeRO 3 #3326

Conversation

qgallouedec commented Apr 18, 2025 • edited Loading

qgallouedec Apr 18, 2025

Choose a reason for hiding this comment

qgallouedec Apr 18, 2025

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 18, 2025

qgallouedec commented Apr 18, 2025 •

edited

Loading