Support for GPU and CPU in one Docker image #495

saileshd1402 · 2025-02-14T11:17:25Z

Feature request

I would like to request to have a single docker image for both CPU and GPU cases. This can be done using a combination of Dockerfile and Dockerfile-cuda-all. An entrypoint.sh can choose between CPU and GPU binaries based on availability of CUDA drivers or based on "CUDA_VISIBLE_DEVICES".

Please let me know your thoughts on this

Motivation

This would help in not always configuring the images to use based on the resources. The image size would be slightly bigger but I think that is a decent trade-off

Your contribution

I would like to help contribute this with a PR, if it's an acceptable feature.

eero-t · 2025-03-31T07:56:07Z

Can you provide a matrix of which CPU (x86, Arm...) and GPU (Nvidia, AMD, Intel...) combinations you'd like to work on?

This would help in not always configuring the images to use based on the resources.

Could you expand a bit how it would help / why that is a problem?

Narsil · 2025-04-08T09:33:09Z

Currently we're not really keen on doing that. We have all CUDA devices image already but merging every path including CPU (and various CPU backends most likely) would mean adding both compile time checks and runtime checks, which complexifies quite a bit the code.

To take an example. let's say you want to run TEI on GPU on a cuda enabled devices, but you make a mistake in your deployment and forget to provide the GPUs to the pod (by forgetting --gpus=all or --device=nvidia.com/gpu=all for instance, or failing to install the proper CNI on the node). With a ALLIN image you'll end up running the CPU version instead, because we wouldn't be able to find the GPU, everything will be rather slow, but you'd likely not have any idea of what the problem is.).

We could have a flag of some kind of help choose which kind of accelerator you'd expect, but that'd come down to something similar as choosing the correct image in the first place (like CUDA_VISIBLE_DEVICES).

alvarobartt mentioned this issue Apr 7, 2025

Error occurs when using ONNX model with text-embeddings-inference turing image #544

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for GPU and CPU in one Docker image #495

Support for GPU and CPU in one Docker image #495

saileshd1402 commented Feb 14, 2025

eero-t commented Mar 31, 2025 •

edited

Loading

Narsil commented Apr 8, 2025

Support for GPU and CPU in one Docker image #495

Support for GPU and CPU in one Docker image #495

Comments

saileshd1402 commented Feb 14, 2025

Feature request

Motivation

Your contribution

eero-t commented Mar 31, 2025 • edited Loading

Narsil commented Apr 8, 2025

eero-t commented Mar 31, 2025 •

edited

Loading