Skip to content

Support for GPU and CPU in one Docker image #495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
saileshd1402 opened this issue Feb 14, 2025 · 2 comments
Open

Support for GPU and CPU in one Docker image #495

saileshd1402 opened this issue Feb 14, 2025 · 2 comments

Comments

@saileshd1402
Copy link

Feature request

I would like to request to have a single docker image for both CPU and GPU cases. This can be done using a combination of Dockerfile and Dockerfile-cuda-all. An entrypoint.sh can choose between CPU and GPU binaries based on availability of CUDA drivers or based on "CUDA_VISIBLE_DEVICES".

Please let me know your thoughts on this

Motivation

This would help in not always configuring the images to use based on the resources. The image size would be slightly bigger but I think that is a decent trade-off

Your contribution

I would like to help contribute this with a PR, if it's an acceptable feature.

@eero-t
Copy link

eero-t commented Mar 31, 2025

Can you provide a matrix of which CPU (x86, Arm...) and GPU (Nvidia, AMD, Intel...) combinations you'd like to work on?

This would help in not always configuring the images to use based on the resources.

Could you expand a bit how it would help / why that is a problem?

@Narsil
Copy link
Collaborator

Narsil commented Apr 8, 2025

Currently we're not really keen on doing that. We have all CUDA devices image already but merging every path including CPU (and various CPU backends most likely) would mean adding both compile time checks and runtime checks, which complexifies quite a bit the code.

To take an example. let's say you want to run TEI on GPU on a cuda enabled devices, but you make a mistake in your deployment and forget to provide the GPUs to the pod (by forgetting --gpus=all or --device=nvidia.com/gpu=all for instance, or failing to install the proper CNI on the node). With a ALLIN image you'll end up running the CPU version instead, because we wouldn't be able to find the GPU, everything will be rather slow, but you'd likely not have any idea of what the problem is.).

We could have a flag of some kind of help choose which kind of accelerator you'd expect, but that'd come down to something similar as choosing the correct image in the first place (like CUDA_VISIBLE_DEVICES).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants