# What Episodic Memory Means for AI Agents

> Humans remember experiences, not just facts. Your agent should too.

Category: Philosophy

---

When you remember a meeting, you don't recall a JSON object. You remember the moment — the room, the voice, the pause before someone made a key point.

Humans have episodic memory. AI agents don't. That's about to change.

## Two Kinds of Memory

Cognitive science distinguishes between:

**Semantic Memory** — Facts and concepts
* "The capital of France is Paris"
* "Water boils at 100°C"
* Timeless, context-free, declarative

**Episodic Memory** — Experienced events
* "I remember the meeting where we discussed the budget"
* "That call where the client mentioned timeline concerns"
* Time-stamped, contextual, experiential

Most AI memory systems are semantic. Vector databases store embeddings of facts. RAG retrieves documents.

But agents that perceive need episodic memory. They need to remember what they saw and heard, when it happened, and what the context was.

## Why Episodic Matters

Consider these queries:

| Query                                                  | Memory Type | What's Needed               |
| :----------------------------------------------------- | :---------- | :-------------------------- |
| "What is our pricing model?"                           | Semantic    | Retrieved from docs         |
| "What did the client say about pricing last Tuesday?"  | Episodic    | Retrieved from recordings   |
| "How many people attended the meeting?"                | Episodic    | Visual memory of the event  |
| "What was on screen when they mentioned the deadline?" | Episodic    | Multimodal temporal context |

Semantic memory can't answer episodic questions. You need memory of experiences — not just facts.

## Video as Natural Episodic Memory

Video is inherently episodic:

* **Time-indexed** — Every frame has a timestamp
* **Multi-sensory** — Visual + audio together
* **Contextual** — Shows the environment, not just content
* **Continuous** — Captures the flow of events

When you record a meeting, you're creating episodic memory. The challenge is making it retrievable.

## The Memory Problem

Raw recordings aren't queryable. You can't ask an MP4 file "what happened?"

Traditional approaches:
1. **Full transcription** — Converts audio to text, loses visual context
2. **Frame extraction** — Expensive, loses temporal flow
3. **Manual notes** — Doesn't scale, subjective
4. **Just store it** — Recording exists but no one can find anything

None of these create true episodic memory. They create archives.

## Indexed Episodic Memory

The solution: indexes that understand what happened and when.

```python
# Create episodic memory from a video
video.index_spoken_words()  # What was said
video.index_scenes(prompt="Describe activities and events")  # What happened

# Query episodic memory
results = video.search("budget discussion")

for shot in results.shots:
    print(f"At {shot.start}s: {shot.text}")
    shot.play()  # Relive the moment
```

The index is the memory. It captures:
* What happened (semantic content)
* When it happened (timestamps)
* Evidence (playable links)

## Ephemeral vs Persistent

Not all perception needs permanent memory.

**Ephemeral** — Process but don't store
* Real-time event detection
* Privacy-sensitive contexts
* Temporary sessions

```python
rtstream.index_visuals(
    prompt="Detect safety issues",
    ephemeral=True  # Don't persist
)
```

**Persistent** — Store for later recall
* Meeting recordings
* Training content
* Compliance archives

```python
video.index_spoken_words()  # Stored by default
```

You control what your agent remembers.

## Desktop as Continuous Input

Desktop capture creates continuous episodic input:

```python
cap = conn.create_capture_session(end_user_id="user_123")

# What the agent "experiences":
# - Screen content (visual)
# - Microphone (spoken)
# - System audio (ambient)
```

The agent perceives the user's experience in real-time. With indexing, it builds memory.

Later:
```python
# Agent recall
"Remember when I was debugging that error? What file was I looking at?"

results = cap.search("debugging error")
shot.play()  # Show the moment
```

## Multi-Session Memory

Episodic memory spans sessions:

```python
# Search across all recordings
results = coll.search("product roadmap discussions")

# Results from any video in the collection
for shot in results.shots:
    print(f"Video: {shot.video_id}, Time: {shot.start}s")
    print(f"Content: {shot.text}")
```

The agent doesn't just remember one meeting. It remembers all meetings.

## Grounded Answers

Episodic memory enables grounded responses:

**Without episodic memory:**
> "I believe the pricing discussion happened last week..."

**With episodic memory:**
> "At 14:32 in yesterday's meeting, Sarah said 'We need to revisit the enterprise tier pricing.' Here's the clip: [play]"

The difference is trust. Episodic memory provides verifiable evidence.

## The Future

The agents we're building will:
* Perceive continuously (screens, mics, cameras)
* Index what they perceive (spoken, visual, events)
* Remember across sessions (episodic recall)
* Answer with evidence (playable proof)

This isn't science fiction. The architecture exists today.

---

## Agents that remember experiences. Not just facts, but moments they can prove.

[Read: Infrastructure that "Sees" and "Edits"](/blogs/infrastructure-that-sees-and-edits)