Production Debugging at Scale¶
Scenario: Production system with 167+ accumulated memories needs debugging of an issue from 3 days ago.
Problem: Impossible to manually search through hundreds of memories to find the exact state when the problem occurred.
Solution: Time-travel instantly to any point in production history with complete memory context.
Overview¶
In production AI systems, memories accumulate over weeks and months of user interactions. When a user reports a problem from several days ago, traditional debugging becomes impossible - manually searching through hundreds of memories to find the exact state when the issue occurred is not feasible.
Memoir provides production-scale time-travel debugging - instantly jump to any point in the system's history with complete memory context, regardless of how many memories have accumulated since.
Large-Scale Production Timeline¶
Production Memory Timeline (167+ memories):
Week 1-4: [84 memories] ──→ Checkpoint: 145 memories
│
Day 1: UI prefs (147) ──→ Day 2: Theme (147) ──→ Day 3: BUG (167)
│ │ │
└─ 145 mem baseline └─ Last good state └─ Problem occurs
│
Day 4: User reports ──→ Time-travel debugging ──→ FIX DEPLOYED
│ │ │
└─ 167 mem current └─ Jump to any point └─ Production fixed
Memory Distribution:
• User activities: 84 memories (weeks 1-4)
• System logs: 60 memories (errors, searches, analytics)
• Preferences: 23 memories (UI, settings, feedback)
• Total: 167 memories across timeline
Debugging Power:
• Traditional: Linear search through 167 memories
• Memoir: Instant jump to exact problem moment
• Context: See exact memory state when bug occurred
• Fix: Test in isolation, deploy safely
Key Code Snippets¶
Building Production History¶
import asyncio
import os
import tempfile
import time
from datetime import datetime, timedelta
from memoir.store.prolly_adapter import ProllyTreeStore
# Initialize production memory store
temp_dir = tempfile.mkdtemp()
prolly_path = os.path.join(temp_dir, "memory_store")
prolly_store = ProllyTreeStore(
path=prolly_path,
enable_versioning=True,
cache_size=10000,
)
namespace = "production_user"
# Simulate 6 months of accumulated memories
base_memories = [
("User prefers dark theme for all interfaces", "preferences.ui.theme"),
("User typically works 9-5 PST timezone", "profile.schedule.work_hours"),
("User has accessibility needs for high contrast", "preferences.accessibility.contrast"),
# ... 144 total memories accumulated over 6 months
]
for content, path in base_memories:
await prolly_store.store_memory_async(namespace, content, path)
# Create checkpoint at 145 memories
initial_checkpoint = f"checkpoint_{int(time.time())}"
prolly_store.create_time_snapshot(initial_checkpoint)
Simulating Problem Timeline¶
# Day 1: Normal user activity (147 memories)
await prolly_store.store_memory_async(
namespace,
"User updated UI preferences to use blue accent color",
"preferences.ui.accent_color"
)
await prolly_store.store_memory_async(
namespace,
"User set notifications to quiet mode during meetings",
"preferences.notifications.meeting_mode"
)
day1_snapshot = f"day1_{int(time.time())}"
prolly_store.create_time_snapshot(day1_snapshot)
# Day 2: Theme preferences (still 147 memories)
await prolly_store.store_memory_async(
namespace,
"User mentioned liking purple color scheme for dashboards",
"preferences.ui.dashboard_colors"
)
day2_snapshot = f"day2_{int(time.time())}"
prolly_store.create_time_snapshot(day2_snapshot)
# Day 3: Problem occurs! (167 memories - system adds 20 error logs)
problem_time = datetime.now()
# Simulate agent malfunction - bad color recommendation
await prolly_store.store_memory_async(
namespace,
"SYSTEM ERROR: Agent recommended bright yellow on white - accessibility violation!",
"system.errors.accessibility"
)
# System logs flood in after the error
for i in range(19):
await prolly_store.store_memory_async(
namespace,
f"Error log {i+1}: Color contrast failed validation checks",
f"system.logs.error_{i+1}"
)
problem_snapshot = f"problem_{int(time.time())}"
prolly_store.create_time_snapshot(problem_snapshot)
User Complaint and Debugging Challenge¶
# Day 4: User files complaint
current_memories = prolly_store.search((namespace,), limit=500)
print("User Complaint Received:")
print('"Agent recommended terrible colors yesterday at 11:15 AM"')
print(f"Current production state: {len(current_memories)} memories")
print("Production Debugging Challenge:")
print(f" Current state: {len(current_memories)} memories in production")
print(" Need to debug: Problem from 3 days ago")
print(" Traditional approach: Search through 167 memories manually")
print(" Memoir approach: Time-travel to exact snapshot")
Time-Travel Debugging¶
print("Time-traveling to problem moment...")
# Instantly jump to exact moment of problem
prolly_store.tree.checkout(problem_snapshot)
problem_memories = prolly_store.search((namespace,), limit=200)
problem_count = len([m for m in problem_memories if m[2] is not None])
print(f"Memory state at problem time:")
print(f" Total memories then: {problem_count}")
# Check for the specific error
error_memory = prolly_store.get((namespace,), "system.errors.accessibility")
if error_memory:
# Aggregated memories are stored as dicts with a `memories` list
first_entry = error_memory.get("memories", [{}])[0]
print(f"Found error: {str(first_entry.get('content', ''))[:50]}...")
Root Cause Analysis¶
print("Root Cause Analysis:")
# Jump to different points in timeline
checkpoints = [
(initial_checkpoint, "Initial checkpoint"),
(day1_snapshot, "Before problem"),
(day2_snapshot, "Day before problem"),
(problem_snapshot, "At problem time")
]
timeline_analysis = []
for checkpoint_id, description in checkpoints:
prolly_store.tree.checkout(checkpoint_id)
memories = prolly_store.search((namespace,), limit=200)
count = len([m for m in memories if m[2] is not None])
timeline_analysis.append((description, count))
print(f"{description}: {count} memories")
print(f"Timeline progression:")
for desc, count in timeline_analysis:
print(f" {desc}: {count} memories")
Historical Context Analysis¶
# Analyze what agent knew before the problem
prolly_store.tree.checkout(day2_snapshot) # Last good state
ui_preferences = []
memories = prolly_store.search((namespace,), limit=200)
for _, path, data in memories:
if data and ("ui" in path or "accessibility" in path):
ui_preferences.append(f"[{path}] {data}")
print("Agent's knowledge before problem:")
for pref in ui_preferences[:3]: # Show top 3
print(f" {pref}")
print("Root cause identified:")
print(" Agent had correct preferences but logic bug ignored them")
Running the Example¶
Sample Output¶
# Production Debugging Demo
Time-travel to debug production issues from user reports
Building production history (6 months of user interactions)...
- Built initial production history: 144 memories
Simulating production timeline...
Day 1: UI preference saved
Day 2: Theme preference saved
Day 3: Agent malfunction - bad recommendation
Simulating continued production usage...
- Total production memories: 167
Day 4: User complaint received
"Agent recommended terrible colors yesterday at 11:15 AM"
Current production state: 167 memories accumulated
Production Debugging Challenge:
Current state: 167 memories in production
Need to debug: Problem from 3 days ago
Traditional approach: Search through 167 memories manually
Memoir approach: Time-travel to exact snapshot
Time-traveling to problem moment...
Memory state at problem time:
Total memories then: 167
Current memories now: 167
Time-traveled back through 167 memories instantly!
Root Cause Analysis:
Time-traveled to initial checkpoint...
Memory state at checkpoint: 145 memories
Time-traveling to just before problem...
Memory state before problem: 147 memories
Timeline progression:
Initial checkpoint: 145 memories
Before problem: 147 memories
At problem time: 167 memories
Current production: 167 memories
Root cause identified:
Agent had correct preferences but logic bug ignored them
Debugged by time-traveling through 167 memories in seconds!
Key Benefits¶
- Large Scale
- Handle 100s-1000s of memories without performance loss
- Time-Travel
- Jump to any point in production history instantly
- Historical Context
- See exact memory state when bug occurred
- Safe Fixes
- Test fixes in isolation before production deployment
- Complete Audit Trail
- Track all changes with timestamps and snapshots
- Traditional Limitation
- Manual search impossible at scale, no historical context
Use Cases¶
- Production Incidents: "Why did the agent fail 3 days ago?"
- User Complaints: "Agent gave bad advice last week"
- Regression Analysis: "When did this behavior start?"
- Compliance Audits: "Show agent state at specific time"
- Performance Issues: "What caused slowdown yesterday?"
- A/B Test Analysis: "Compare agent behavior before/after change"
Advanced Production Debugging¶
Multi-User Timeline Analysis¶
# Debug across multiple user namespaces
production_users = ["user123", "user456", "user789"]
for user_id in production_users:
prolly_store.tree.checkout(problem_snapshot)
user_memories = prolly_store.search((user_id,), limit=100)
# Check if problem affected this user
for _, path, data in user_memories:
if data and "error" in path.lower():
print(f"User {user_id} affected: {path}")
Performance Impact Analysis¶
# Measure time-travel performance with large datasets
start_time = time.time()
prolly_store.tree.checkout(problem_snapshot)
memories = prolly_store.search((namespace,), limit=1000)
end_time = time.time()
print(f"Time-travel through {len(memories)} memories: {end_time - start_time:.3f}s")
print("Traditional search would take: 30-120+ seconds")
Production Fix Workflow¶
# Create fix branch from clean state
prolly_store.tree.checkout(day2_snapshot) # Last known good
fix_branch = f"hotfix_{int(time.time())}"
prolly_store.tree.create_branch(fix_branch)
prolly_store.tree.checkout(fix_branch)
# Apply corrected logic
await prolly_store.store_memory_async(
namespace,
"Enhanced accessibility validation: Always check contrast ratios",
"system.fixes.accessibility_validation"
)
# Test fix in isolation
test_results = await run_color_recommendation_test()
if test_results.passed:
# Deploy to production
prolly_store.tree.checkout("main")
prolly_store.tree.merge(fix_branch)
Next Steps¶
- Try Memory State Debugging: memory_debugging
- Learn about Conversational Context Branching: context_branching
- See Reproducible Testing: reproducible_testing
- View the complete API Reference: ../api/memoir