You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would it possible to have a cli warm-up that allows to have a warm-up requests before server start for arbitrary models? The types of warm-up requests would be for embed, classify and rerank from what I understand. It would essentially just send a small dummy request so that the first inference request is not slow.
Motivation
Currently I think only the Flash implementation of certain models does an automatic warmup, but it would be nice to have a cli argument can perform a warmup call to the models that are served using TEI.
Currently most models have a very slow first request
Feature request
Would it possible to have a cli warm-up that allows to have a warm-up requests before server start for arbitrary models? The types of warm-up requests would be for
embed
,classify
andrerank
from what I understand. It would essentially just send a small dummy request so that the first inference request is not slow.Motivation
Currently I think only the Flash implementation of certain models does an automatic warmup, but it would be nice to have a cli argument can perform a warmup call to the models that are served using TEI.
Currently most models have a very slow first request
For
sentence-transformers/all-MiniLM-L6-v2
- 1.6sBAAI/bge-reranker-base
- 1.2sI can provide more examples for models if required.
Your contribution
I can help with testing and verifying the fix if required!
cc @Narsil @alvarobartt
The text was updated successfully, but these errors were encountered: