Allow for saving the PPOTrainer value model (critic model) #3308
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds functionality for
save_value_model: bool=False
which determines whether to save the value model that is trained by PPOTrainer.During training, PPO puts quite a lot of effort training the value model to be good at predicting the reward the policy will receive. That sounds like a valuable tool, and it is a shame that previously we had simply been discarding it. This PR adds functionality for keeping the value model.
I requested this feature here, and this PR resolves that.
It also resolves the error described in this issue: no attribute 'policy' when pushing PPOTrainer to hub using example ppo.py script #3301
I'm unsure whether the way I fixed it, the early return on line 338 in PPOTrainer, is legal. Please check!
I would love your feedback, @qgallouedec!