-
Notifications
You must be signed in to change notification settings - Fork 4.1k
feat: expose language detection probabilities to server example #3044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@@ -919,13 +919,34 @@ int main(int argc, char ** argv) { | |||
} else if (params.response_format == vjson_format) { | |||
/* try to match openai/whisper's Python format */ | |||
std::string results = output_str(ctx, params, pcmf32s); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Remove empty spaces (you can't see them here but they show up in local diffs as red bars, and it is just nice to not have the extra "noise").
// Get language probabilities | ||
std::vector<float> lang_probs(whisper_lang_max_id() + 1, 0.0f); | ||
const auto detected_lang_id = whisper_lang_auto_detect(ctx, 0, params.n_threads, lang_probs.data()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Remove empty spaces.
json lang_info = json::object(); | ||
// Include the probability of the detected language | ||
lang_info["probability"] = lang_probs[detected_lang_id]; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Remove empty spaces.
} | ||
} | ||
lang_info["language_probabilities"] = all_lang_probs; | ||
jres["language_detection"] = lang_info; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could be add to jres
directly so that it is easy to see all the attributes returned in one place, for example:
json jres = json{
{"task", params.translate ? "translate" : "transcribe"},
{"language", whisper_lang_str_full(whisper_full_lang_id(ctx))},
{"duration", float(pcmf32.size())/WHISPER_SAMPLE_RATE},
{"text", results},
{"segments", json::array()},
{"language_detection", lang_info},
};
Description:
This PR enhances the JSON API response by adding detailed language detection information when transcribing or translating audio. The changes include:
The new information is added under a
language_detection
field in the JSON response, containing:probability
: Confidence score for the detected languagelanguage_probabilities
: Map of language codes to their detection probabilitiesThis enhancement provides more transparency into the language detection process and can be valuable for applications requiring confidence scores in language identification.
The changes are non-breaking and only add additional information to the existing JSON response structure.
Example Output:
In this example:
language_detection
field added