feat: expose language detection probabilities to server example #3044

sachaarbonel · 2025-04-14T08:21:13Z

Description:
This PR enhances the JSON API response by adding detailed language detection information when transcribing or translating audio. The changes include:

Language detection probabilities for the detected language
A comprehensive list of language probabilities for all languages with non-negligible confidence scores (>0.001)
Integration with Whisper's existing language detection capabilities

The new information is added under a language_detection field in the JSON response, containing:

probability: Confidence score for the detected language
language_probabilities: Map of language codes to their detection probabilities

This enhancement provides more transparency into the language detection process and can be valuable for applications requiring confidence scores in language identification.

The changes are non-breaking and only add additional information to the existing JSON response structure.

Example Output:

{
  "task": "transcribe",
  "language": "english",
  "text": "This is the transcribed text of the audio file.",
  "language_detection": {
    "probability": 0.982,
    "language_probabilities": {
      "en": 0.982,
      "fr": 0.008,
      "es": 0.005,
      "de": 0.003
    }
  },
  "segments": [
    // ... segments array content ...
  ]
}

In this example:

The main detected language (English) has a 98.2% confidence score
Other languages with lower probabilities are also included
Only languages with probabilities > 0.001 (0.1%) are shown
The original JSON structure remains intact, with the new language_detection field added

danbev · 2025-04-16T06:36:41Z

examples/server/server.cpp

@@ -919,13 +919,34 @@ int main(int argc, char ** argv) {
        } else if (params.response_format == vjson_format) {
            /* try to match openai/whisper's Python format */
            std::string results = output_str(ctx, params, pcmf32s);
+


Nit: Remove empty spaces (you can't see them here but they show up in local diffs as red bars, and it is just nice to not have the extra "noise").

danbev · 2025-04-16T06:37:00Z

examples/server/server.cpp

+            // Get language probabilities
+            std::vector<float> lang_probs(whisper_lang_max_id() + 1, 0.0f);
+            const auto detected_lang_id = whisper_lang_auto_detect(ctx, 0, params.n_threads, lang_probs.data());
+


Nit: Remove empty spaces.

danbev · 2025-04-16T06:51:05Z

examples/server/server.cpp

+            json lang_info = json::object();
+            // Include the probability of the detected language
+            lang_info["probability"] = lang_probs[detected_lang_id];
+


Nit: Remove empty spaces.

danbev · 2025-04-16T06:54:21Z

examples/server/server.cpp

+                }
+            }
+            lang_info["language_probabilities"] = all_lang_probs;
+            jres["language_detection"] = lang_info;


Perhaps this could be add to jres directly so that it is easy to see all the attributes returned in one place, for example:

json jres = json{ {"task", params.translate ? "translate" : "transcribe"}, {"language", whisper_lang_str_full(whisper_full_lang_id(ctx))}, {"duration", float(pcmf32.size())/WHISPER_SAMPLE_RATE}, {"text", results}, {"segments", json::array()}, {"language_detection", lang_info}, };

feat: expose language detection probabilities to server.cpp

6f5c781

sachaarbonel changed the title ~~feat: expose language detection probabilities to server.cpp~~ feat: expose language detection probabilities to server example Apr 14, 2025

danbev approved these changes Apr 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose language detection probabilities to server example #3044

feat: expose language detection probabilities to server example #3044

sachaarbonel commented Apr 14, 2025

danbev Apr 16, 2025

danbev Apr 16, 2025

danbev Apr 16, 2025

danbev Apr 16, 2025

feat: expose language detection probabilities to server example #3044

Are you sure you want to change the base?

feat: expose language detection probabilities to server example #3044

Conversation

sachaarbonel commented Apr 14, 2025

danbev Apr 16, 2025

Choose a reason for hiding this comment

danbev Apr 16, 2025

Choose a reason for hiding this comment

danbev Apr 16, 2025

Choose a reason for hiding this comment

danbev Apr 16, 2025

Choose a reason for hiding this comment