Add a raw generate API to the vLLM server #3227

wilrop · 2025-04-03T16:00:45Z

What does this PR do?

This PR adds a generate_raw endpoint to the vLLM server. While this is not strictly necessary (yet) for any trainer provided in TRL, this does allow for easier experimentation. For example, it is now straightforward to send curl requests and inspect the results. Additionally, this endpoint allows other processes to communicate with the same vLLM server as used by TRL which may be useful for some applications.

Speaking from personal experience, I actually need my TRL trainer and another process to utilise the same vLLM server and this was the easiest way to get that working. I imagine others might find it useful as well.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

qgallouedec · 2025-04-05T00:24:42Z

it is now straightforward to send curl requests and inspect the results

With the current implementation, you can send curl requests. Or maybe you want to have access to everything returned by vLLM, not just the IDs?

qgallouedec · 2025-04-05T00:27:35Z

Additionally, this endpoint allows other processes to communicate with the same vLLM server as used by TRL which may be useful for some applications.

Can you elaborate on this point? It's not very clear. Do you mean that it allows trl vllm-serve and vllm serve to be more aligned?

wilrop · 2025-04-05T09:22:43Z

it is now straightforward to send curl requests and inspect the results

With the current implementation, you can send curl requests. Or maybe you want to have access to everything returned by vLLM, not just the IDs?

Yes exactly, having the full RequestOutput is quite convenient for debugging or when another process wants direct access to the vLLM generations.

Additionally, this endpoint allows other processes to communicate with the same vLLM server as used by TRL which may be useful for some applications.

Can you elaborate on this point? It's not very clear. Do you mean that it allows trl vllm-serve and vllm serve to be more aligned?

Yes this is essentially what I mean. In my own use case, I have a TRL trainer that is using the vLLM client for generation but I also have another process that is making frequent generation calls to the model that is being trained by TRL. Intuitively, this other process wants to have access to fast generations from the latest version of the model. Therefore, having this process communicate directly through the same server, which gets regular updates from the TRL trainer, greatly simplifies the workflow.

I realise that this PR is maybe a bit niche, but I have actually a real need for this and I suppose others may find it useful in the future as well.

wilrop · 2025-04-17T10:36:00Z

Any news on this PR?

@qgallouedec

Add a raw generate API to the vLLM server

e036ac5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a raw generate API to the vLLM server #3227

Add a raw generate API to the vLLM server #3227

wilrop commented Apr 3, 2025

qgallouedec commented Apr 5, 2025

qgallouedec commented Apr 5, 2025

wilrop commented Apr 5, 2025

wilrop commented Apr 17, 2025

Add a raw generate API to the vLLM server #3227

Are you sure you want to change the base?

Add a raw generate API to the vLLM server #3227

Conversation

wilrop commented Apr 3, 2025

What does this PR do?

Before submitting

Who can review?

qgallouedec commented Apr 5, 2025

qgallouedec commented Apr 5, 2025

wilrop commented Apr 5, 2025

wilrop commented Apr 17, 2025