Skip to main content
Back
Agent sandboxes

Sandboxes with unlimited memory and perception.

Every browser-use, computer-use, and code-sandbox agent has the same blind spot: when the container shuts down, the memory is gone. We're plugging that hole — with eyes, ears, and persistent recall for every box your agent spins up.

An agent sandbox with eyes, ears, and persistent recall for every container.

The most underrated trend of 2026 is the rise of the agent sandbox. Browser-use, computer-use, code-sandboxes. The agent's working memory is no longer your laptop — it's a fresh container, spun up on demand, that lives for the duration of one task and then disappears. This is a fundamentally better model than running agents on your machine. It's also fundamentally amnesiac.

Every sandbox today has the same problem: the agent shuts down at the end of the task and everything it learned shuts down with it. The next run starts from zero. The screen events it saw, the moments it noticed, the patterns it picked up — all gone.

Plug VideoDB into the runtime and that stops being true. The container gets eyes, ears, and a memory layer that outlives it. Run agent_42 today, run it again next month, and the agent remembers what it saw the last time.

What "perception in the sandbox" actually means

A sidecar that turns screen, stdout, and file changes into recall.

Most agent sandboxes today work on a simple primitive: they expose a screen, a keyboard, a mouse, and maybe a shell. The agent does its thing inside the box and reports back. That works for narrow tasks. It falls apart the moment the agent has to understand what just happened on the screen.

VideoDB sits as a sidecar to the sandbox. It taps the screen capture stream, the stdout, the file changes (whatever the runtime exposes) and runs the same perception layer it runs on a desktop or a live video feed. The agent gets a recall API instead of having to "remember" via context window stuffing.

"The container is ephemeral. The memory isn't."

The Try-my-repo agent

Hand it a GitHub URL; get back a watchable demo.

The cleanest demo of this whole pattern is something we've been calling Try-my-repo. You hand it a GitHub URL. A fresh sandbox spins up. It clones the repo. It runs the install. It runs the tests. It reads the README. A Pi-class agent narrates the whole session as a voiceover. The output is a demo video, under a minute, with every meaningful moment indexed and clippable.

What you get back is not a transcript or a PR comment. It's a watchable artifact. You see the agent actually trying the thing. You hear it explain what went wrong. You can scrub to the moment a test failed and see the screen at that exact frame.

This is what a useful agent looks like. Not a wall of text. A clip you can play.

Three classes of sandbox we're powering today

Browser-use, computer-use, and code sandboxes — all of them gain sight.

1. Browser-use agents

Browser-use is the most popular sandbox category in 2026. The agent gets a browser. It navigates, fills forms, clicks buttons. The state of the art is still DOM-driven, but DOM is a lossy projection of what's actually on the screen. Perception fixes that. The agent sees the page the way a human sees it, including the parts that don't appear in the DOM at all.

2. Computer-use agents

Computer-use is the broader bet: give the agent a full operating system in a container. Anthropic and others have published reference implementations. The blocker for long-horizon tasks has been memory. The agent's context window can't hold a multi-hour session. With VideoDB plugged into the runtime, memory is no longer bounded by the model. The agent can run for hours and recall any moment from the session as a clip.

3. Code sandboxes

The classic code-execution sandbox (E2B, Modal, Daytona) gets a perception layer on top. The agent runs code, watches output, takes screenshots, builds an indexed history of what it tried. Failed runs become searchable. Successful runs become reproducible artifacts.

Sandbox partnerships

VideoDB ships native integrations with the leading sandbox providers. One config flag in your runtime gives the box a perception sidecar. No glue code, no infra to manage.

What the integration looks like

One config flag attaches the perception sidecar.

Attach VideoDB to a session and let it watch while the agent works:

# Attach VideoDB to any sandbox session
async with sandbox.session() as box:
    vdb.attach(box)             # perception sidecar

    box.run("npm install")
    box.run("npm test")
    box.read("README.md")

    vdb.narrate(voice="pi")   # voiceover the whole session
    demo = vdb.publish()       # → demo.mp4 with searchable index

Why this matters more than it sounds like it does

The agent economy will run on sandboxes — and every one needs this.

The agent economy of the next five years is going to run on sandboxes. Not on your laptop, not in a chat window. In disposable containers that spin up by the millions. Every one of them is going to need three things: a runtime, a memory layer, and a way to show humans what happened inside.

VideoDB is all three. The container gets eyes and ears. The memory persists past the lifetime of the box. The human gets a watchable artifact at the end. That's the missing piece in every "agents will do X" story today.

Machine