This repository was archived by the owner on Oct 9, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 114
Add configs to run int4 inference #37
Open
RezaYazdaniAminabadi
wants to merge
5
commits into
huggingface:main
Choose a base branch
from
RezaYazdaniAminabadi:int4-bloom
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
572e644
Add configs to run int4 inference
132d99d
fix quantization-bit config & turn off ds_sample
99cd7c9
change the quantization config format to work with the new style at D…
32779e8
Update bloom-ds-inference.py
awan-10 b472e48
Merge pull request #1 from awan-10/patch-1
RezaYazdaniAminabadi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make for a more user-friendly API to
dtype
intactquantization_bits
deepspeed.init_inference
derive the number of bits fromdtype
?not only the currently suggested override is confusing, I fail to see what purpose serves carrying the same information in
dtype
and andquantization_bits
twiceThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, wait,
torch.init4
still doesn't exist, does it?let's find the feature request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not implemented pytorch/pytorch#74627
so that's why you had to do the odd workarounds, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can drop it once its implemented @stas00 ?
For now, this might be the best way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #37 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's pointless to wait, since they won't have
int3
andint12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stas00 and @RezaYazdaniAminabadi - just clarifying that we have introduced a new DeepSpeedInferenceConfig that can be passed to init_inference. We are keeping it backwards compatible but if we are okay to make changes to this file, I would advocate for writing a config dictionary for DeepSpeed and pass that to init_inference instead of the various kwargs. Please see here for an example: https://gist.github.com/awan-10/6e3d5c756be3a876522e860c6bbf702d#file-bloom-ds-inference-py-L173
Also, see the docs for the new config: https://deepspeed.readthedocs.io/en/latest/inference-init.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That definitely works.
@awan-10, may I suggest you make the inference
config
acceptdict_or_path
just likezero
does? it might be for some users easier to write out a separate file.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stas00 - thanks for the suggestion. Created an issue so we can track it: deepspeedai/DeepSpeed#2532. Mike and I will work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much, @awan-10