-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix gemini 2.5 flash on Vertex AI #10189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
) | ||
if reasoning_tokens: | ||
# Usage(...) constructor expects that completion_tokens includes the reasoning_tokens. | ||
# However the Vertex AI usage metadata does not include reasoning tokens in candidatesTokenCount. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any documentation / reference for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I'm immediately aware of. I didn't even know this was a problem until it was mentioned here: #10141 (comment). Once I looked at my logs and did manual testing, I confirmed the behavior for Vertex AI.
I have not tested the Gemini API myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, with Gemini 2.5 Flash on Vertex AI:
"usageMetadata": {
"promptTokenCount": 10,
"candidatesTokenCount": 2622,
"totalTokenCount": 4434,
"trafficType": "ON_DEMAND",
"promptTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 10
}
],
"candidatesTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 2622
}
],
"thoughtsTokenCount": 1802
},
As can be seen from the total token count, candidates token count does not include the thoughts token count (total = candidates + thoughts + prompt).
if thinking_enabled: | ||
params["includeThoughts"] = True | ||
if thinking_budget: | ||
if not thinking_enabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a unit test for this behaviour in here - https://github.com/BerriAI/litellm/blob/main/tests/litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't really sure what you were looking for. Take a look at the test I added and let me know if you wanted something else.
@@ -910,6 +915,16 @@ def transform_response( | |||
completion_response=completion_response, | |||
) | |||
|
|||
thinking_enabled = None | |||
if "gemini-2.5-flash" in model: | |||
# Only Gemini 2.5 Flash can have its thinking disabled by setting the thinking budget to zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens on gemini-2.5-pro? so if you send it thinking budget = 0
- what is the response from gemini-2.5-flash vs. gemini-2.5-pro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I compared the behavior of gemini-2.5-flash and gemini-2.5-pro, setting the thinking budget to 0 only had an effect on gemini-2.5-flash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe its related. Can you kindly confirm?
Using gemini as per docs at https://docs.litellm.ai/docs/tutorials/openai_codex
codex -m gemini-2.0-flash --full-auto
with a prompt that attempts to use a screenshot / image
Generate a web app with the backing fastapi based backend that mimics the a.png in this folder
I get the below error
litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Invalid user message={'role': 'user', 'content': [{'type': 'text', 'text': 'Generate a web app with the backing fastapi based backend that mimics the in this folder'}, {'type': 'image', 'text': None}]} at index 1. Please ensure all user messages are valid OpenAI chat completion messages.
Traceback (most recent call last):
File "/Users/vichandrasekharan/code/temp/codex/.venv/lib/python3.10/site-packages/litellm/utils.py", line 6315, in validate_chat_completion_user_messages
raise Exception("invalid content type")
Exception: invalid content type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinaynair this is an unrelated error message.
It is because of this
{'type': 'image', 'text': None}
Which does look like invalid input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please file a separate ticket for a feature request, where we filter this scenario
|
@krrishdholakia i think if thinking is enabled, all output tokens are charged at 3.5$ |
Title
Fix cost calculations and thinking budget for Gemini 2.5 Flash on Vertex AI
Relevant issues
Fixes #10141
Fixes #10121
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/
directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit
)[https://docs.litellm.ai/docs/extras/contributing_code]Type
🐛 Bug Fix
Changes