# Agentic Perception — VideoDB

> Build AI agents with visual perception. One SDK gives your agent eyes, ears, and memory across screen, mic, video, and live sessions. Native runtimes for Mac, Windows, Linux, and the web.

**Status:** Agentic Perception

---

## Hero

AI is moving out of the chatbox. Agents are creating content, running marketing, recording meetings, taking calls, and using the computer. The world they operate in is live, continuous, and perceived through vision and voice — not turns of text. VideoDB gives your agents realtime real-world context and memory: one SDK across screen, mic, files, and live streams, so your agent can see what just happened, recall what it watched, and act on what it heard.

Surfaces: Screen · Mic · Camera · Files · Live streams

CTAs: [Try the SDK](https://docs.videodb.io/pages/getting-started/welcome) · [View OSS agents](https://github.com/video-db)

---

## What builders ship

The next generation of software won't live in a chat window. It will watch your screen, work the web for you, and run inside containers that never sleep. Builders on VideoDB are already shipping all three.

| Surface | What it is | Reference |
|---|---|---|
| Desktop | Agents that watch the screen with you. Pair programmers, meeting copilots, and second brains that share your screen, never your data. | [/agentic-perception/desktop](/agentic-perception/desktop) |
| Web | Agents that work the open web for you. Long-running pipelines that research, create, and publish: faceless YouTube channels, daily marketing, video research briefs. | [/agentic-perception/web](/agentic-perception/web) |
| Sandbox | Agent runtimes with unlimited memory. Every container, every browser-use and computer-use agent gets eyes, ears, and persistent recall. Hand it a repo; get back a demo. | [/agentic-perception/sandboxes](/agentic-perception/sandboxes) |

---

## Capabilities — one SDK, all of media

Files, live streams, screen captures — all enter the same system.

- **One command.** `npx skills add video-db/skills` bootstraps every primitive into your agent runtime.
- **Files, RTSP, screen, mic.** One API across every source.
- **Compose understanding.** Custom indexes the way you compose endpoints.
- **Search returns a playable clip.** Not metadata. Not timestamps. A clip the agent can play.
- **Stream in, stream out.** Sub-second alert, act, respond.
- **Claude Code · OpenAI · Cursor · n8n · Zapier.** Drop into any agent that speaks tools.

---

## Two modes for agents

Realtime by default. Memory when you ask. Stream in, context out. Nothing is stored unless you say so. Flip one flag when a moment is worth keeping.

- **Mode 1 · Ephemeral (Default).** Realtime, no storage. Frames flow in, structured events flow out. Nothing touches disk. Best for live copilots, alerting, and anything sub-second.
- **Mode 2 · Memory (Optional).** Remember and search. Flip one flag and the moment becomes a searchable clip. Memory and search are opt-in: on for the moments you care about, off everywhere else.

---

## Build a perception box

A dedicated perception runtime for teams that need realtime throughput, predictable cost and load, and zero outbound calls to a model API. Sized to your fleet. Every frame, every inference, every retrieval runs inside the box. Use the bundled models, or bring your own open-weight model.

Highlights: Realtime processing · Zero outbound · One flat number · Bring your own model

- **Realtime, sub-second pipeline.** Ingest, perception, and event-out sized to your throughput from day one.
- **Built-in network monitor.** Verify isolation in one glance. A live view of every connection the runtime makes.
- **Bundled perception models.** Vision, speech, and embedding models pre-loaded and ready to use.
- **One capacity envelope.** Flat monthly cost. No per-token surprises, no traffic-driven spikes.

CTA: [Talk to us](/company#contact)

---

## No-code workflows

Every VideoDB primitive is exposed as a node on n8n and Zapier — same primitives, drag-and-drop. Index a feed, search for a moment, clip and deliver, all without writing code.

- **n8n.** Drag-and-drop video memory: capture a stream, index it, retrieve clips, post to Slack or your CMS, all in a visual flow with the VideoDB nodes. [View n8n nodes](https://github.com/video-db/n8n-workflows)
- **Zapier.** Triggers and actions: trigger on a new clip in memory, cut a highlight, push to Drive, message the team. Wire VideoDB into the 6,000+ apps Zapier already supports. [View Zapier app](https://zapier.com/apps/videodb/integrations)

---

## Try it yourself

Every agent on this page ships as an open-source repo or a runnable notebook.

- **call.md** — Meetings captured as markdown with playable clips for every decision. [GitHub repo](https://github.com/video-db/call.md)
- **Pair programmer** — An agent that watches your screen and YouTube tabs and brainstorms with full context. [GitHub repo](https://github.com/video-db/pair-programmer)
- **Research agents** — A report you can watch. An agent crawls the web and assembles a video brief. [Live notebook](https://github.com/video-db/agentic-streams)
- **Try my repo** — Hand it a repo. A Pi agent runs it, narrates it, and ships back a demo video. [GitHub repo](https://www.trymyrepo.com/)
- **Build desktop agents** — Install the native SDK on Mac, Windows, or Linux. Start streaming screen + mic in minutes. [Quickstart notebook](/developers#quickstart)

---

## Closing

Give your agents eyes and ears.

```bash
npx skills add video-db/skills
```

CTAs: [Start building](https://console.videodb.io/auth) · [View on GitHub](https://github.com/video-db)

---

© 2026 VideoDB, Inc. · videodb.io · hello@videodb.io
