[🐯+GRPO] Support FSDP + Fix bug when using LigerGRPO with DDP #3260

shivam15s · 2025-04-08T02:17:53Z

What does this PR do?

This PR aims to do two things:

The recent integration of LigerGRPO had a bug: when using DDP and performing a forward pass through a submodule of the unwrapped model, the necessary hooks weren't registered correctly. This caused the model weights across GPUs to fall out of sync. To fix this, the PR introduces a Forward Redirection mechanism—a workaround that ensures hooks are properly registered (compatible with both DDP and FSDP) and enables the custom forward pass required by Liger.
Add support for FSDP to GRPO Trainer. We leverage summon_full_params to make model.generate work with FSDP.

Experiment Script: https://gist.github.com/shivam15s/08a9bccd0d72dd0d29bdb912cb9885be

DDP: Liger (blue) v Non-liger (black)

FSDP: Liger (green) v Non-liger (Purple)

Known Limitations with FSDP (can add support in subsequent PR(s))

sync_ref_model not supported currently
create_reference_model not supported currently

Benchmarking:
Dist Strategy: DDP
7 policy workers, 1 vllm worker (8 h100)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

kashif · 2025-04-11T09:44:38Z

testing using:

import torch
from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer
import torch.distributed as dist
from torch.profiler import profile, record_function, ProfilerActivity
from transformers import TrainerCallback
import os
# from torch.distributed.fsdp import FSDPConfig, AutoWrapPolicy
# dataset = load_dataset("trl-internal-testing/zen", "standard_prompt_only", split="train")
dataset = load_dataset("trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness", split="train")
# only keep the prompt column
dataset = dataset.map(lambda x: {"prompt": x["prompt"]}, remove_columns=dataset.column_names)

training_args = GRPOConfig(
    output_dir="./scratch_dir",
    learning_rate=0.001,  # increase the learning rate to speed up the test
    per_device_train_batch_size=3,  # reduce the batch size to reduce memory usage
    num_generations=3,  # reduce the number of generations to reduce memory usage
    report_to=["tensorboard"],
    max_completion_length=256,  # reduce the completion length to reduce memory usage
    logging_steps=1,
    save_strategy="no",
    max_steps=50,
    use_liger_loss=True,
)
trainer = GRPOTrainer(
    model="trl-internal-testing/tiny-Qwen2ForCausalLM-2.5",
    reward_funcs="trl-internal-testing/tiny-Qwen2ForSequenceClassification-2.5",
    args=training_args,
    train_dataset=dataset,
)

class ProfCallback(TrainerCallback):
    def __init__(self, prof):
        self.prof = prof

    def on_step_end(self, args, state, control, **kwargs):
        self.prof.step()

# Create directory for profiling outputs
os.makedirs("profiling_results", exist_ok=True)

# Define profiling context manager
def train_with_profiling(enable_profiling=True):
    if enable_profiling:
        with profile(
            activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
            record_shapes=True,
            profile_memory=True,
            with_stack=True,
            with_flops=True,
            on_trace_ready=torch.profiler.tensorboard_trace_handler("profiling_results") if trainer.accelerator.is_main_process else None,
            schedule=torch.profiler.schedule(
                wait=1,
                warmup=1,
                active=2,
                repeat=1),
        ) as prof:
            trainer.add_callback(ProfCallback(prof))
            trainer.train()
        # Print profiling results summary
        # print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=20))
    else:
        trainer.train()

# trainer.train()
train_with_profiling(enable_profiling=False)

# destroy process group
if dist.is_initialized():
    dist.destroy_process_group()

HuggingFaceDocBuilderDev · 2025-04-12T08:33:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

shivam15s · 2025-04-12T15:37:54Z

Python 3.9 test is failing because of a backward incompatible initialization. Created a PR in liger to fix this

## Summary  #662 initialized the argument in a way that is not compatible with python3.9 so changing it to a backward compatible initialization. This unblocks TRL PR huggingface/trl#3260  ## Testing Done   - Hardware Type: <BLANK> - [ ] run `make test` to ensure correctness - [ ] run `make checkstyle` to ensure code style - [ ] run `make test-convergence` to ensure convergence

hjh0119 · 2025-04-16T03:21:43Z

Awesome work! Could you share any memory footprint comparisons (with vs. without Liger Loss enabled)?

shivam15s · 2025-04-17T00:00:56Z

Hi @hjh0119
I was able to achieve about a 20% reduction in peak memory usage.

Peak Memory:
• Without Liger: 71.1 GB
• With Liger: 56.7 GB
here's my code: https://gist.github.com/shivam15s/08a9bccd0d72dd0d29bdb912cb9885be

Ubuntu and others added 20 commits April 8, 2025 01:57

add liger GRPO Loss

db19e3b

use ref_per_token_logps as input and make liger compute same metrics

e77bd4c

add grpo slow test with liger

4b38b76

precommit

bb1c3d5

rename config to use_liger_loss

2146fec

minor refactor and fix bug in rebasing

cbe6efd

move to Parameters that control the training

46663e7

split compute_loss to call helper

46c8c68

remove num_items_in_batch

1371078

refactor to mention last hidden state

7d6b394

Update test_grpo_slow.py

31ab5ab

add fwd_redirection to support liger+ddp and fsdp

c28a5b1

fix things

6b1d138

fix rebase bug

78e2517

fix rebase bug

41c4169

fix rebase bug

546f75d

checkstyle

e2fa63c

bug fix in rebase

f8c3ed1

add comment

b5fd692

change model generate to use summon full params

d8d5130

kashif self-assigned this Apr 8, 2025

kashif and others added 4 commits April 8, 2025 15:22

Merge branch 'main' into shisahni/fsdp_ddp_liger

47b221d

:Merge remote-tracking branch 'origin/main' into shisahni/fsdp_ddp_liger

ad06d68

Merge branch 'main' into shisahni/fsdp_ddp_liger

dbf84cb

isort

20a1111

kashif marked this pull request as ready for review April 11, 2025 20:20

shivam15s added 2 commits April 11, 2025 21:41

support different loss types in liger

3de4d73

update liger version

7e26a18

shivam15s mentioned this pull request Apr 12, 2025

backward compatible initialization linkedin/Liger-Kernel#666

Merged

3 tasks

update liger version

1c716ac

Merge branch 'main' into shisahni/fsdp_ddp_liger

eedc352

shivam15s and others added 8 commits April 17, 2025 02:20

add all gather for moving fsdp weights to vllm

3139420

add comments

b681f92

Merge branch 'main' into shisahni/fsdp_ddp_liger

a6d4d6f

Merge branch 'main' into shisahni/fsdp_ddp_liger

33f00da

Merge branch 'main' into shisahni/fsdp_ddp_liger

d3457c2

add partial recurse

ec06891

Merge branch 'main' into shisahni/fsdp_ddp_liger

b3a7637

Merge branch 'main' into shisahni/fsdp_ddp_liger

0bb331d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[🐯+GRPO] Support FSDP + Fix bug when using LigerGRPO with DDP #3260

[🐯+GRPO] Support FSDP + Fix bug when using LigerGRPO with DDP #3260

shivam15s commented Apr 8, 2025 •

edited

Loading

kashif commented Apr 11, 2025

HuggingFaceDocBuilderDev commented Apr 12, 2025

shivam15s commented Apr 12, 2025

hjh0119 commented Apr 16, 2025

shivam15s commented Apr 17, 2025

[🐯+GRPO] Support FSDP + Fix bug when using LigerGRPO with DDP #3260

Are you sure you want to change the base?

[🐯+GRPO] Support FSDP + Fix bug when using LigerGRPO with DDP #3260

Conversation

shivam15s commented Apr 8, 2025 • edited Loading

What does this PR do?

Before submitting

Who can review?

kashif commented Apr 11, 2025

HuggingFaceDocBuilderDev commented Apr 12, 2025

shivam15s commented Apr 12, 2025

hjh0119 commented Apr 16, 2025

shivam15s commented Apr 17, 2025

shivam15s commented Apr 8, 2025 •

edited

Loading