Fix gemini 2.5 flash on Vertex AI #10189

awesie · 2025-04-21T14:06:52Z

Title

Fix cost calculations and thinking budget for Gemini 2.5 Flash on Vertex AI

Relevant issues

Fixes #10141
Fixes #10121

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on (make test-unit)[https://docs.litellm.ai/docs/extras/contributing_code]
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🐛 Bug Fix

Changes

vercel · 2025-04-21T14:06:56Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Apr 21, 2025 4:50pm

krrishdholakia · 2025-04-21T15:41:07Z

litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py

+        )
+        if reasoning_tokens:
+            # Usage(...) constructor expects that completion_tokens includes the reasoning_tokens.
+            # However the Vertex AI usage metadata does not include reasoning tokens in candidatesTokenCount.


is there any documentation / reference for this?

Not that I'm immediately aware of. I didn't even know this was a problem until it was mentioned here: #10141 (comment). Once I looked at my logs and did manual testing, I confirmed the behavior for Vertex AI.

I have not tested the Gemini API myself.

For example, with Gemini 2.5 Flash on Vertex AI:

"usageMetadata": { "promptTokenCount": 10, "candidatesTokenCount": 2622, "totalTokenCount": 4434, "trafficType": "ON_DEMAND", "promptTokensDetails": [ { "modality": "TEXT", "tokenCount": 10 } ], "candidatesTokensDetails": [ { "modality": "TEXT", "tokenCount": 2622 } ], "thoughtsTokenCount": 1802 },

As can be seen from the total token count, candidates token count does not include the thoughts token count (total = candidates + thoughts + prompt).

krrishdholakia · 2025-04-21T15:41:51Z

litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py

-        if thinking_enabled:
-            params["includeThoughts"] = True
-        if thinking_budget:
+        if not thinking_enabled:


please add a unit test for this behaviour in here - https://github.com/BerriAI/litellm/blob/main/tests/litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py

I wasn't really sure what you were looking for. Take a look at the test I added and let me know if you wanted something else.

krrishdholakia · 2025-04-21T15:43:23Z

litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py

@@ -910,6 +915,16 @@ def transform_response(
                    completion_response=completion_response,
                )

+        thinking_enabled = None
+        if "gemini-2.5-flash" in model:
+            # Only Gemini 2.5 Flash can have its thinking disabled by setting the thinking budget to zero


what happens on gemini-2.5-pro? so if you send it thinking budget = 0

what is the response from gemini-2.5-flash vs. gemini-2.5-pro?

When I compared the behavior of gemini-2.5-flash and gemini-2.5-pro, setting the thinking budget to 0 only had an effect on gemini-2.5-flash.

Maybe its related. Can you kindly confirm?

Using gemini as per docs at https://docs.litellm.ai/docs/tutorials/openai_codex
codex -m gemini-2.0-flash --full-auto
with a prompt that attempts to use a screenshot / image
Generate a web app with the backing fastapi based backend that mimics the a.png in this folder

I get the below error

litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Invalid user message={'role': 'user', 'content': [{'type': 'text', 'text': 'Generate a web app with the backing fastapi based backend that mimics the in this folder'}, {'type': 'image', 'text': None}]} at index 1. Please ensure all user messages are valid OpenAI chat completion messages. Traceback (most recent call last): File "/Users/vichandrasekharan/code/temp/codex/.venv/lib/python3.10/site-packages/litellm/utils.py", line 6315, in validate_chat_completion_user_messages raise Exception("invalid content type") Exception: invalid content type

@vinaynair this is an unrelated error message.

It is because of this

{'type': 'image', 'text': None}

Which does look like invalid input

please file a separate ticket for a feature request, where we filter this scenario

CLAassistant · 2025-04-21T16:49:55Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Classic298 · 2025-04-23T08:03:19Z

@krrishdholakia i think if thinking is enabled, all output tokens are charged at 3.5$

awesie added 3 commits April 20, 2025 17:36

Gemini 2.5 Flash output cost is based on thinking enabled

58c2551

Clean up Gemini thinking config

62ed5b2

Fix completion_tokens on Vertex AI Gemini thinking models

f87f500

vercel bot deployed to Preview April 21, 2025 14:07 View deployment

krrishdholakia reviewed Apr 21, 2025

View reviewed changes

Add test for Gemini 2.5 Flash thinking off

e546e5f

vercel bot deployed to Preview April 21, 2025 16:50 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gemini 2.5 flash on Vertex AI #10189

Fix gemini 2.5 flash on Vertex AI #10189

awesie commented Apr 21, 2025

vercel bot commented Apr 21, 2025 •

edited

Loading

krrishdholakia Apr 21, 2025

awesie Apr 21, 2025

awesie Apr 21, 2025

krrishdholakia Apr 21, 2025

awesie Apr 21, 2025

krrishdholakia Apr 21, 2025

awesie Apr 21, 2025

vinaynair Apr 21, 2025

krrishdholakia Apr 21, 2025

krrishdholakia Apr 21, 2025

CLAassistant commented Apr 21, 2025 •

edited

Loading

Classic298 commented Apr 23, 2025

Fix gemini 2.5 flash on Vertex AI #10189

Are you sure you want to change the base?

Fix gemini 2.5 flash on Vertex AI #10189

Conversation

awesie commented Apr 21, 2025

Title

Relevant issues

Pre-Submission checklist

Type

Changes

vercel bot commented Apr 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Apr 21, 2025 • edited Loading

Classic298 commented Apr 23, 2025

vercel bot commented Apr 21, 2025 •

edited

Loading

CLAassistant commented Apr 21, 2025 •

edited

Loading