Skip to content

[FEAT] summarized chat completion context #6217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

SongChiYoung
Copy link
Contributor

@SongChiYoung SongChiYoung commented Apr 5, 2025

🧠 Summary

This PR introduces the structural design for a new context type: SummarizedChatCompletionContext.
It allows message summarization to be triggered within agent-local contexts, based on user-defined conditions — decoupling summarization from termination.

This draft focuses on infrastructure only: the new context class, message summarization condition interfaces, and logical composition (AND / OR) are all in place — but the system is not yet wired to active agents or tested in workflows.


✨ Motivation

In complex multi-agent systems (e.g., SocietyOfMindAgent or deeply nested teams), agent-internal messages often grow long and redundant.
These messages:

  • Leak into outer context evaluation
  • Pollute termination conditions
  • Waste tokens

To address this, summarization must happen before termination — inside the agent context itself.
This PR introduces a new kind of ChatCompletionContext that enables exactly that.

See: Discussion #6160


🧱 Key Components

1. SummarizedChatCompletionContext (New)

  • Subclass of ChatCompletionContext
  • Accepts:
    • summarizing_func: a function that takes messages and non_summarized_messages and returns a summary
    • summarizing_condition: a subclass of MessageCompletionCondition
  • Automatically triggers summary() when condition is met
  • Can support async summarization later
await self._summarizing_condition(self._messages)
if self._summarizing_condition.triggered:
    await self.summary()
  1. MessageCompletionCondition Interface (New)
    • Abstract base class to define summarization triggers
    • Tracks .triggered state
    call(messages) pattern

Also includes:
• reset() method for reuse
and / or logic via:
• AndMessageCompletionCondition
• OrMessageCompletionCondition

Related issue number

#6160

Checks

ToDo

  • Tools style User defined summary function serialize (Need to help!) - Now Serialize is working!
  • pyright/mypy confirmed
  • Documentation (If we need more than docstring)
  • Fix Docstrings
  • Fix Func/Args names.., (If we need it)
  • Test code / coverage

@SongChiYoung
Copy link
Contributor Author

Just to add a bit of context:

  • This implementation heavily references and reuses the structure of existing termination logic.
  • While I believe this new context could potentially replace or unify several existing ChatCompletionContext variants, any such refactoring is out of scope for this PR and will not be addressed here.

Copy link
Collaborator

@ekzhu ekzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can start with a concreate implementation of summarization? As a first version, an actual implementation would provide the most value than a templated version with scaffolding like summarization_func and various conditions.

For example, a simple implementation that uses a model client to convert a list of LLMMessage into a single message is already very useful. It can be triggered for every 10 messages, for example.

A highly opinionated but concrete implementation gets feedbacks and usage, and users can pick it up quickly and provide feedbacks.

@SongChiYoung
Copy link
Contributor Author

@ekzhu
Got it — that makes sense!

Just to confirm: is the current structure and file placement generally okay?

I actually have more progress locally — I'm currently working on porting the termination logic over (even if it's slightly imperfect for now).

My network is a bit unstable at the moment, so I’ll push everything once I get a better connection. Thanks!

@SongChiYoung
Copy link
Contributor Author

SongChiYoung commented Apr 7, 2025

@ekzhu
Now, It will be work!

example code(Works in my mac)

    import asyncio
    from autogen_core.models import UserMessage, AssistantMessage
    from autogen_ext.models.openai import OpenAIChatCompletionClient
    from autogen_ext.models.anthropic import AnthropicChatCompletionClient
    from autogen_agentchat.agents import AssistantAgent
    from autogen_core.model_context import SummarizedChatCompletionContext
    from autogen_core.model_context.conditions import (
        MaxMessageCompletion
    )
    from autogen_ext.summary import buffered_summary


    client = OpenAIChatCompletionClient(
        model="claude-3-haiku-20240307"
    )
    
    print(client.model_info)
    context = SummarizedChatCompletionContext(
        summarizing_func = buffered_summary(buffer_count=2),
        summarizing_condition = MaxMessageCompletion(max_messages=2)
    )
    agent = AssistantAgent(
        "helper",
        model_client=client,
        system_message="You are a helpful agent",
        model_context=context
    )
    
    async def run():
        from pprint import pprint
        res = await agent.run(task="What is the capital of France?")
        pprint(res)
        pprint(await context.get_messages())
        res = await agent.run(task="What is the capital of Korea?")
        pprint(res)
        pprint(await context.get_messages())
        

    asyncio.run(run())

@SongChiYoung
Copy link
Contributor Author

SongChiYoung commented Apr 10, 2025

NOW! Serialize is working!

Example code

def test13():
    import asyncio
    from autogen_core.models import UserMessage, AssistantMessage
    from autogen_ext.models.openai import OpenAIChatCompletionClient
    from autogen_ext.models.anthropic import AnthropicChatCompletionClient
    from autogen_agentchat.agents import AssistantAgent
    from autogen_core.model_context import SummarizedChatCompletionContext
    from autogen_core.model_context.conditions import (
        MaxMessageCompletion
    )
    from autogen_ext.summary import (
        buffered_summary,
        buffered_summarized_chat_completion_context
    )


    client = OpenAIChatCompletionClient(
        model="claude-3-haiku-20240307"
    )
    
    print(client.model_info)
    context = SummarizedChatCompletionContext(
        summarizing_func = buffered_summary(buffer_count=2),
        summarizing_condition = MaxMessageCompletion(max_messages=2)
    )
    agent = AssistantAgent(
        "helper",
        model_client=client,
        system_message="You are a helpful agent",
        model_context=context
    )
    
    async def run():
        from pprint import pprint
        res = await agent.run(task="What is the capital of France?")
        pprint(res)
        pprint(await context.get_messages())
        res = await agent.run(task="What is the capital of Korea?")
        pprint(res)
        pprint(await context.get_messages())
        

    asyncio.run(run())

    print("=====================")
    print(agent.dump_component())
    print("=====================")

    agent = AssistantAgent(
        "helper",
        model_client=client,
        system_message="You are a helpful agent",
        model_context=buffered_summarized_chat_completion_context(
            buffer_count=2,
            max_messages=2
        )
    )
    
    async def run():
        from pprint import pprint
        res = await agent.run(task="What is the capital of France?")
        pprint(res)
        pprint(await context.get_messages())
        res = await agent.run(task="What is the capital of Korea?")
        pprint(res)
        pprint(await context.get_messages())
        

    asyncio.run(run())

    test = agent.dump_component()
    print("=====================")
    print(test)
    print("=====================")

    agent = AssistantAgent.load_component(test)
    
    async def run():
        from pprint import pprint
        res = await agent.run(task="What is the capital of France?")
        pprint(res)
        pprint(await context.get_messages())
        res = await agent.run(task="What is the capital of Korea?")
        pprint(res)
        pprint(await context.get_messages())
    asyncio.run(run())

if __name__ == "__main__":
    test13()

@SongChiYoung
Copy link
Contributor Author

SongChiYoung commented Apr 10, 2025

Now fixed comment/Docstring!
Please check if I need to write other documentation.

@SongChiYoung SongChiYoung marked this pull request as ready for review April 10, 2025 13:46
@SongChiYoung
Copy link
Contributor Author

@ekzhu
This PR is now fully functional, tested, and ready for review.

All core features are implemented:

  • SummarizedChatCompletionContext
  • Condition-based summarization triggers (e.g., MaxMessageCompletion)
  • Buffered summary strategy
  • Component-based serialization and config restoration
  • Complete unit test coverage with mypy / pyright strict compatibility

Looking forward to feedback!

@SongChiYoung SongChiYoung changed the title [DRAFT] summarized chat completion context [FEAT] summarized chat completion context Apr 10, 2025
@SongChiYoung
Copy link
Contributor Author

Next Step Idea (will be a separate PR):

As a next step, I’m planning to implement an LLM-based summarizer using AutoGen agents — such as CodeExecutorAgent or AssistantAgent — to generate summaries directly from recent messages.

The goal is to:

  • Enable more semantically aware summarization via LLMs
  • Support prompt-based, model-guided strategies (e.g., one-shot, chain-of-thought)
  • Leverage AutoGen’s existing agent infrastructure for summarization tasks

This will be added as an optional summarizer type, while still fully supporting the current function-based summarization approach — no breaking changes, just an additional pluggable option.

The interface will remain compatible with SummarizedChatCompletionContext, using the same summarizing_func slot and a tool-style wrapper for serialization.

Looking forward to exploring this in a follow-up PR!

Copy link
Collaborator

@ekzhu ekzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see there is a lot of progress made much more beyond the initial scope of the issue.

I still hold my last point:

Perhaps we can start with a concreate implementation of summarization? As a first version, an actual implementation would provide the most value than a templated version with scaffolding like summarization_func and various conditions.

For example, a simple implementation that uses a model client to convert a list of LLMMessage into a single message is already very useful. It can be triggered for every 10 messages, for example.

A highly opinionated but concrete implementation gets feedbacks and usage, and users can pick it up quickly and provide feedbacks.

I think instead of building all the scaffolding using conditions, we can start with a much simpler user experience. For example,

from autogen_core.model_context import SummarizerChatCompletionContext

summarizer_context = SummarizerChatCompletionContext(
  model_client=model_client,
  summary_prompt="Summarize the conversation so far for your own memory",
  summary_format="This portion of conversation has been summarized as follow: {summary}",
  summary_interval=10, # trigger for every 10 messages
  summary_start=2, #the produced summary will replace the portion of message history starting from the 3rd message.
  summary_end=-2, #the produced summary will replace the portion of message history ending at the 2nd to the last message. 
)

agent = AssistantAgent("assistant", ..., model_context=summarizer_context)

In this example, the model_client is used to perform the summary. This is mostly likely how ChatGPT performs model context summary, and most users just want something like this that work out-of-the-box.

@SongChiYoung
Copy link
Contributor Author

@ekzhu
Thanks for the thoughtful feedback. I can definitely add a model_client-based summarization style as a high-level option.

That said, I still believe the current structure using summarizing_func and summarizing_condition is conceptually aligned with how AutoGen users already use termination_condition in GroupChat. So from that perspective, I think it could actually feel more intuitive and familiar to many users.

Also, as I understand it, summarization is a highly experimental and research-driven area. I believe this kind of extensible and general-purpose structure can enable AutoGen users - especially research-oriented users - to explore novel summarization strategies much more easily.

That said, I’ll work on bridging the two - providing a simple preset (like SummarizerChatCompletionContext) that’s powered by the current modular infrastructure underneath.

Appreciate your guidance as always!

@ekzhu
Copy link
Collaborator

ekzhu commented Apr 15, 2025

If we add a new user-facing concept for every new feature, the framework will quickly turn into chaos.

There is indeed value for more structured code in this component, but not now. We can always add those code later. We don't need to create too much structure. The original Langchain code base has been criticized by many people for introducing too many abstractions, I hope we don't follow the same path.

My suggestion is to follow Keep It Simple and Stupid (KISS): just the minimal code required for the minimal viable usage case. This way, it is much easier to write unit tests that creates high coverage.

@ekzhu
Copy link
Collaborator

ekzhu commented Apr 15, 2025

More on my previous comment.

I agree with you the conditions are similar to termination conditions already in the framework. In the future there will be a space for those, perhaps in different forms. But let's get users to use the basic feature first and gather feedback.

For this PR, let's not add the new concepts. Please only create the implementation for a new SummarizerChatCompletionContext class and unit tests for it.

@SongChiYoung
Copy link
Contributor Author

SongChiYoung commented Apr 18, 2025

@ekzhu
Got it — in that case, how about turning this into a 3rd-party extension instead, without including the more complex summary engine in core?

I can of course implement the simpler version you suggested, but in my case, that approach doesn’t quite work well — especially when dealing with long contexts and nuanced summarization (e.g., ChatGPT or Claude sometimes "forget" key parts in long chats).
If you'd like, I can still work on that simpler version in a separate PR (Like your advice thing).

Also, if possible, please don’t close or significantly alter this PR for now — I’d like to reference it in a follow-up PR to the 3rd-party extensions page.

This PR was structured with a bit more abstraction because I was exploring ideas like:

  • summarizing only non-user messages
  • source-aware summarization (e.g., per-agent)
  • meta-summary: generating multiple summaries from the same context, then summarizing those

These are admittedly more experimental, but reflect the direction I was aiming for.
So perhaps it makes more sense to keep that direction in a community extension — happy to maintain it if that’s preferred.

Let me know what you think!

Also, just to note — if someday the AutoGen community finds this kind of structured summarization useful (and others express similar needs), I’d be happy to contribute the extension back into core and hand over maintenance. No attachment on my end — just want the idea to be available if it's ever helpful.

@ekzhu
Copy link
Collaborator

ekzhu commented Apr 18, 2025

Got it — in that case, how about turning this into a 3rd-party extension instead, without including the more complex summary engine in core?

I think this will be a good outcome.

Let's just have a separate PR to add the extension to the list.

@SongChiYoung
Copy link
Contributor Author

Got it — in that case, how about turning this into a 3rd-party extension instead, without including the more complex summary engine in core?

I think this will be a good outcome.

Let's just have a separate PR to add the extension to the list.

Thx.
I was build it at (https://github.com/SongChiYoung/autogen-contextplus)
and make PR for update 3-party extensions doc.

ekzhu added a commit that referenced this pull request Apr 21, 2025
DOC: add extentions - autogen-oaiapi and autogen-contextplus

the contextplus is user define autogen model_context.
It discussion in #6217 and #6160

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
@ekzhu ekzhu marked this pull request as draft April 21, 2025 19:31
@ekzhu
Copy link
Collaborator

ekzhu commented Apr 21, 2025

Converted to draft for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants