New

Conversation Summarization

Automatic summarization of long conversations to preserve context while staying within token limits.

Overview

Long conversations can exceed token limits, causing context to be lost. Conversation summarization automatically compresses older messages into a summary, preserving important context while keeping the conversation within limits.

Memory Strategies

truncate

Simply drops oldest messages when limit is reached. Fast but loses context.

summarize ⭐

Summarizes older messages into a condensed form. Preserves context intelligently.

sliding_window

Keeps only the most recent N messages. Good for short-term context.

Basic Usage

summarization.ts

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://www.superagentstack.com/api/v1',
  apiKey: process.env.OPENROUTER_KEY,
  defaultHeaders: { 'superAgentKey': process.env.SUPER_AGENT_KEY },
});

const response = await client.chat.completions.create({
  model: 'openai/gpt-4o-mini',
  messages: [{ role: 'user', content: 'Continue our discussion' }],
  sessionId: crypto.randomUUID(),
  saveToMemory: true,
  memoryStrategy: 'summarize',  // Use summarization
  summaryThreshold: 20,         // Summarize after 20 messages
});

Parameters

Parameter	Type	Default	Description
`memoryStrategy`	string	"truncate"	"truncate" \| "summarize" \| "sliding_window"
`summaryThreshold`	number	50	Number of messages before summarization triggers. Minimum: 10.

Minimum Threshold

The summaryThreshold must be at least 10. Lower values will be rejected.

How Summarization Works

Conversation reaches the summaryThreshold
Older messages (before the threshold) are sent to the LLM for summarization
The summary is stored and replaces the older messages
Recent messages are kept intact for immediate context
Future requests include: summary + recent messages

Example Flow

summarization-flow.ts

// Messages 1-20: Normal conversation
// ...

// Message 21: Threshold reached, summarization triggers
// Old messages (1-15) → Summarized
// Recent messages (16-21) → Kept intact

// Message 22+: Context includes:
// - Summary of messages 1-15
// - Full messages 16-22

// The AI maintains context without exceeding token limits!

Strategy Comparison

Strategy	Context Preservation	Speed	Best For
truncate	Low	Fast	Simple chats, cost-sensitive
summarize	High	Medium	Long conversations, complex topics
sliding_window	Medium	Fast	Recent context only matters

Best Practices

Use summarize for customer support and long-form conversations
Use sliding_window for quick Q&A where only recent context matters
Set summaryThreshold based on your typical conversation length
Higher thresholds = more context before summarization, but higher token usage