You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 9, 2024. It is now read-only.
Hey!
I'm using a custom version of this repo to run BLOOM-175B with DeepSpeed and it works great, thank you for this!
I was thinking of exploring using large models (such as OPT-175B) and was wondering what is the process for creating a pre-sharded, int8 deepspeed checkpoint for it, similar to https://huggingface.co/microsoft/bloom-deepspeed-inference-int8
Is there any documentation available or example scripts for this?
The text was updated successfully, but these errors were encountered:
I am unsure about OPT's compatibility with deepspeed.
But if it works, you can simply pass save_mp_checkpoint_path parameter to init_inference method.
This will create a pre-sharded fp16 version (assuming it works :) )
If you don't have memory constraints (number of GPUs), I will encourage you to use fp16 since it is faster.
int8/int4 will be much faster once DeepSpeed starts supporting their kernels.
Hey!
I'm using a custom version of this repo to run BLOOM-175B with DeepSpeed and it works great, thank you for this!
I was thinking of exploring using large models (such as OPT-175B) and was wondering what is the process for creating a pre-sharded, int8 deepspeed checkpoint for it, similar to https://huggingface.co/microsoft/bloom-deepspeed-inference-int8
Is there any documentation available or example scripts for this?
The text was updated successfully, but these errors were encountered: