Skip to content

Allow for saving the PPOTrainer value model (critic model) #3308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

AMindToThink
Copy link
Contributor

This PR adds functionality for save_value_model: bool=False which determines whether to save the value model that is trained by PPOTrainer.

During training, PPO puts quite a lot of effort training the value model to be good at predicting the reward the policy will receive. That sounds like a valuable tool, and it is a shame that previously we had simply been discarding it. This PR adds functionality for keeping the value model.

I requested this feature here, and this PR resolves that.
It also resolves the error described in this issue: no attribute 'policy' when pushing PPOTrainer to hub using example ppo.py script #3301
I'm unsure whether the way I fixed it, the early return on line 338 in PPOTrainer, is legal. Please check!

I would love your feedback, @qgallouedec!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant