Skip to content

feat: expose language detection probabilities to server example #3044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions examples/server/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -919,13 +919,34 @@ int main(int argc, char ** argv) {
} else if (params.response_format == vjson_format) {
/* try to match openai/whisper's Python format */
std::string results = output_str(ctx, params, pcmf32s);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove empty spaces (you can't see them here but they show up in local diffs as red bars, and it is just nice to not have the extra "noise").

// Get language probabilities
std::vector<float> lang_probs(whisper_lang_max_id() + 1, 0.0f);
const auto detected_lang_id = whisper_lang_auto_detect(ctx, 0, params.n_threads, lang_probs.data());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove empty spaces.

json jres = json{
{"task", params.translate ? "translate" : "transcribe"},
{"language", whisper_lang_str_full(whisper_full_lang_id(ctx))},
{"duration", float(pcmf32.size())/WHISPER_SAMPLE_RATE},
{"text", results},
{"segments", json::array()}
};

// Always include language detection info
json lang_info = json::object();
// Include the probability of the detected language
lang_info["probability"] = lang_probs[detected_lang_id];

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove empty spaces.

// Add all language probabilities
json all_lang_probs = json::object();
for (int i = 0; i <= whisper_lang_max_id(); ++i) {
if (lang_probs[i] > 0.001f) { // Only include non-negligible probabilities
all_lang_probs[whisper_lang_str(i)] = lang_probs[i];
}
}
lang_info["language_probabilities"] = all_lang_probs;
jres["language_detection"] = lang_info;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could be add to jres directly so that it is easy to see all the attributes returned in one place, for example:

            json jres = json{                                                   
                {"task", params.translate ? "translate" : "transcribe"},        
                {"language", whisper_lang_str_full(whisper_full_lang_id(ctx))}, 
                {"duration", float(pcmf32.size())/WHISPER_SAMPLE_RATE},         
                {"text", results},                                              
                {"segments", json::array()},                                    
                {"language_detection", lang_info},                              
            };


const int n_segments = whisper_full_n_segments(ctx);
for (int i = 0; i < n_segments; ++i)
{
Expand Down
Loading