New

Conversation Summarization

Automatic summarization of long conversations to preserve context while staying within token limits.

Overview

Long conversations can exceed token limits, causing context to be lost. Conversation summarization automatically compresses older messages into a summary, preserving important context while keeping the conversation within limits.

Memory Strategies

truncate

Simply drops oldest messages when limit is reached. Fast but loses context.

summarize ⭐

Summarizes older messages into a condensed form. Preserves context intelligently.

sliding_window

Keeps only the most recent N messages. Good for short-term context.

Basic Usage

summarization.ts
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://www.superagentstack.com/api/v1',
  apiKey: process.env.OPENROUTER_KEY,
  defaultHeaders: { 'superAgentKey': process.env.SUPER_AGENT_KEY },
});

const response = await client.chat.completions.create({
  model: 'openai/gpt-4o-mini',
  messages: [{ role: 'user', content: 'Continue our discussion' }],
  sessionId: crypto.randomUUID(),
  saveToMemory: true,
  memoryStrategy: 'summarize',  // Use summarization
  summaryThreshold: 20,         // Summarize after 20 messages
});

Parameters

ParameterTypeDefaultDescription
memoryStrategystring"truncate""truncate" | "summarize" | "sliding_window"
summaryThresholdnumber50Number of messages before summarization triggers. Minimum: 10.

Minimum Threshold

The summaryThreshold must be at least 10. Lower values will be rejected.

How Summarization Works

  1. Conversation reaches the summaryThreshold
  2. Older messages (before the threshold) are sent to the LLM for summarization
  3. The summary is stored and replaces the older messages
  4. Recent messages are kept intact for immediate context
  5. Future requests include: summary + recent messages

Example Flow

summarization-flow.ts
// Messages 1-20: Normal conversation
// ...

// Message 21: Threshold reached, summarization triggers
// Old messages (1-15) → Summarized
// Recent messages (16-21) → Kept intact

// Message 22+: Context includes:
// - Summary of messages 1-15
// - Full messages 16-22

// The AI maintains context without exceeding token limits!

Strategy Comparison

StrategyContext PreservationSpeedBest For
truncateLowFastSimple chats, cost-sensitive
summarizeHighMediumLong conversations, complex topics
sliding_windowMediumFastRecent context only matters

Best Practices

  • Use summarize for customer support and long-form conversations
  • Use sliding_window for quick Q&A where only recent context matters
  • Set summaryThreshold based on your typical conversation length
  • Higher thresholds = more context before summarization, but higher token usage