Allow for saving the PPOTrainer value model (critic model) #3308

AMindToThink · 2025-04-16T17:52:14Z

This PR adds functionality for save_value_model: bool=False which determines whether to save the value model that is trained by PPOTrainer.

During training, PPO puts quite a lot of effort training the value model to be good at predicting the reward the policy will receive. That sounds like a valuable tool, and it is a shame that previously we had simply been discarding it. This PR adds functionality for keeping the value model.

I requested this feature here, and this PR resolves that.
It also resolves the error described in this issue: no attribute 'policy' when pushing PPOTrainer to hub using example ppo.py script #3301
I'm unsure whether the way I fixed it, the early return on line 338 in PPOTrainer, is legal. Please check!

I would love your feedback, @qgallouedec!

…t None

…k/trl-SAC into save-ppo-value-model

…s after putting them on huggingface!

into save_value_new

AMindToThink and others added 14 commits April 15, 2025 12:58

value_model can't be None, so it shouldn't be Optional or have defaul…

c98ea02

…t None

made changes, now to test saving value

0f59fa5

Merge branch 'save-ppo-value-model' of https://github.com/AMindToThin…

e7ddfcf

…k/trl-SAC into save-ppo-value-model

testing

4f54087

get rid of output_dir None check

aa2be6f

added prints

36c4c82

fix policy not found error by making sure policy exists

3735157

It works: use AutoModel with the subfolder argument to load the model…

624cd34

…s after putting them on huggingface!

Delete debugging prints

47a0da7

More description on save_value_model

81cb592

run precommit

ec23826

Merge remote-tracking branch 'upstream/main' into save_value_new

16c5396

Merge branch 'save_value_new' of https://github.com/AMindToThink/trl-SAC

0c147b8

into save_value_new

Put wandb back

a288b00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for saving the PPOTrainer value model (critic model) #3308

Allow for saving the PPOTrainer value model (critic model) #3308

AMindToThink commented Apr 16, 2025

Allow for saving the PPOTrainer value model (critic model) #3308

Are you sure you want to change the base?

Allow for saving the PPOTrainer value model (critic model) #3308

Conversation

AMindToThink commented Apr 16, 2025