-
Notifications
You must be signed in to change notification settings - Fork 6.6k
[FEAT] summarized chat completion context #6217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[FEAT] summarized chat completion context #6217
Conversation
Just to add a bit of context:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can start with a concreate implementation of summarization? As a first version, an actual implementation would provide the most value than a templated version with scaffolding like summarization_func
and various conditions.
For example, a simple implementation that uses a model client to convert a list of LLMMessage
into a single message is already very useful. It can be triggered for every 10 messages, for example.
A highly opinionated but concrete implementation gets feedbacks and usage, and users can pick it up quickly and provide feedbacks.
@ekzhu Just to confirm: is the current structure and file placement generally okay? I actually have more progress locally — I'm currently working on porting the termination logic over (even if it's slightly imperfect for now). My network is a bit unstable at the moment, so I’ll push everything once I get a better connection. Thanks! |
…_chat_completion_context
…_chat_completion_context
…_chat_completion_context
…_chat_completion_context
@ekzhu example code(Works in my mac) import asyncio
from autogen_core.models import UserMessage, AssistantMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.models.anthropic import AnthropicChatCompletionClient
from autogen_agentchat.agents import AssistantAgent
from autogen_core.model_context import SummarizedChatCompletionContext
from autogen_core.model_context.conditions import (
MaxMessageCompletion
)
from autogen_ext.summary import buffered_summary
client = OpenAIChatCompletionClient(
model="claude-3-haiku-20240307"
)
print(client.model_info)
context = SummarizedChatCompletionContext(
summarizing_func = buffered_summary(buffer_count=2),
summarizing_condition = MaxMessageCompletion(max_messages=2)
)
agent = AssistantAgent(
"helper",
model_client=client,
system_message="You are a helpful agent",
model_context=context
)
async def run():
from pprint import pprint
res = await agent.run(task="What is the capital of France?")
pprint(res)
pprint(await context.get_messages())
res = await agent.run(task="What is the capital of Korea?")
pprint(res)
pprint(await context.get_messages())
asyncio.run(run()) |
NOW! Serialize is working! Example code def test13():
import asyncio
from autogen_core.models import UserMessage, AssistantMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.models.anthropic import AnthropicChatCompletionClient
from autogen_agentchat.agents import AssistantAgent
from autogen_core.model_context import SummarizedChatCompletionContext
from autogen_core.model_context.conditions import (
MaxMessageCompletion
)
from autogen_ext.summary import (
buffered_summary,
buffered_summarized_chat_completion_context
)
client = OpenAIChatCompletionClient(
model="claude-3-haiku-20240307"
)
print(client.model_info)
context = SummarizedChatCompletionContext(
summarizing_func = buffered_summary(buffer_count=2),
summarizing_condition = MaxMessageCompletion(max_messages=2)
)
agent = AssistantAgent(
"helper",
model_client=client,
system_message="You are a helpful agent",
model_context=context
)
async def run():
from pprint import pprint
res = await agent.run(task="What is the capital of France?")
pprint(res)
pprint(await context.get_messages())
res = await agent.run(task="What is the capital of Korea?")
pprint(res)
pprint(await context.get_messages())
asyncio.run(run())
print("=====================")
print(agent.dump_component())
print("=====================")
agent = AssistantAgent(
"helper",
model_client=client,
system_message="You are a helpful agent",
model_context=buffered_summarized_chat_completion_context(
buffer_count=2,
max_messages=2
)
)
async def run():
from pprint import pprint
res = await agent.run(task="What is the capital of France?")
pprint(res)
pprint(await context.get_messages())
res = await agent.run(task="What is the capital of Korea?")
pprint(res)
pprint(await context.get_messages())
asyncio.run(run())
test = agent.dump_component()
print("=====================")
print(test)
print("=====================")
agent = AssistantAgent.load_component(test)
async def run():
from pprint import pprint
res = await agent.run(task="What is the capital of France?")
pprint(res)
pprint(await context.get_messages())
res = await agent.run(task="What is the capital of Korea?")
pprint(res)
pprint(await context.get_messages())
asyncio.run(run())
if __name__ == "__main__":
test13() |
Now fixed comment/Docstring! |
@ekzhu All core features are implemented:
Looking forward to feedback! |
Next Step Idea (will be a separate PR): As a next step, I’m planning to implement an LLM-based summarizer using AutoGen agents — such as The goal is to:
This will be added as an optional summarizer type, while still fully supporting the current function-based summarization approach — no breaking changes, just an additional pluggable option. The interface will remain compatible with Looking forward to exploring this in a follow-up PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see there is a lot of progress made much more beyond the initial scope of the issue.
I still hold my last point:
Perhaps we can start with a concreate implementation of summarization? As a first version, an actual implementation would provide the most value than a templated version with scaffolding like summarization_func and various conditions.
For example, a simple implementation that uses a model client to convert a list of LLMMessage into a single message is already very useful. It can be triggered for every 10 messages, for example.
A highly opinionated but concrete implementation gets feedbacks and usage, and users can pick it up quickly and provide feedbacks.
I think instead of building all the scaffolding using conditions, we can start with a much simpler user experience. For example,
from autogen_core.model_context import SummarizerChatCompletionContext
summarizer_context = SummarizerChatCompletionContext(
model_client=model_client,
summary_prompt="Summarize the conversation so far for your own memory",
summary_format="This portion of conversation has been summarized as follow: {summary}",
summary_interval=10, # trigger for every 10 messages
summary_start=2, #the produced summary will replace the portion of message history starting from the 3rd message.
summary_end=-2, #the produced summary will replace the portion of message history ending at the 2nd to the last message.
)
agent = AssistantAgent("assistant", ..., model_context=summarizer_context)
In this example, the model_client
is used to perform the summary. This is mostly likely how ChatGPT performs model context summary, and most users just want something like this that work out-of-the-box.
@ekzhu That said, I still believe the current structure using Also, as I understand it, summarization is a highly experimental and research-driven area. I believe this kind of extensible and general-purpose structure can enable AutoGen users - especially research-oriented users - to explore novel summarization strategies much more easily. That said, I’ll work on bridging the two - providing a simple preset (like Appreciate your guidance as always! |
If we add a new user-facing concept for every new feature, the framework will quickly turn into chaos. There is indeed value for more structured code in this component, but not now. We can always add those code later. We don't need to create too much structure. The original Langchain code base has been criticized by many people for introducing too many abstractions, I hope we don't follow the same path. My suggestion is to follow Keep It Simple and Stupid (KISS): just the minimal code required for the minimal viable usage case. This way, it is much easier to write unit tests that creates high coverage. |
More on my previous comment. I agree with you the conditions are similar to termination conditions already in the framework. In the future there will be a space for those, perhaps in different forms. But let's get users to use the basic feature first and gather feedback. For this PR, let's not add the new concepts. Please only create the implementation for a new SummarizerChatCompletionContext class and unit tests for it. |
@ekzhu I can of course implement the simpler version you suggested, but in my case, that approach doesn’t quite work well — especially when dealing with long contexts and nuanced summarization (e.g., ChatGPT or Claude sometimes "forget" key parts in long chats). Also, if possible, please don’t close or significantly alter this PR for now — I’d like to reference it in a follow-up PR to the 3rd-party extensions page. This PR was structured with a bit more abstraction because I was exploring ideas like:
These are admittedly more experimental, but reflect the direction I was aiming for. Let me know what you think!
|
I think this will be a good outcome. Let's just have a separate PR to add the extension to the list. |
Thx. |
Converted to draft for now |
🧠 Summary
This PR introduces the structural design for a new context type:
SummarizedChatCompletionContext
.It allows message summarization to be triggered within agent-local contexts, based on user-defined conditions — decoupling summarization from termination.
This draft focuses on infrastructure only: the new context class, message summarization condition interfaces, and logical composition (
AND
/OR
) are all in place — but the system is not yet wired to active agents or tested in workflows.✨ Motivation
In complex multi-agent systems (e.g.,
SocietyOfMindAgent
or deeply nested teams), agent-internal messages often grow long and redundant.These messages:
To address this, summarization must happen before termination — inside the agent context itself.
This PR introduces a new kind of
ChatCompletionContext
that enables exactly that.See: Discussion #6160
🧱 Key Components
1.
SummarizedChatCompletionContext
(New)ChatCompletionContext
summarizing_func
: a function that takesmessages
andnon_summarized_messages
and returns a summarysummarizing_condition
: a subclass ofMessageCompletionCondition
summary()
when condition is met• Abstract base class to define summarization triggers
• Tracks .triggered state
• call(messages) pattern
Also includes:
• reset() method for reuse
• and / or logic via:
• AndMessageCompletionCondition
• OrMessageCompletionCondition
Related issue number
#6160
Checks
ToDo