# VideoDB × LlamaIndex: Plug Video Into Your RAG Pipeline

> The official VideoDB connector for LlamaIndex lets you treat video as a first-class source in any retrieval-augmented pipeline.

Category: Partnerships

---

Most RAG systems are fluent in text. They parse docs, chunk PDFs, embed web pages, and retrieve the paragraph that grounds an answer. But video is where a lot of the real world lives: product demos, lectures, security footage, training calls, sports clips, meetings, tutorials, news, livestreams. If your RAG pipeline cannot retrieve video moments, it is blind to one of the highest-bandwidth knowledge sources you have.

The VideoDB retriever for LlamaIndex lets a LlamaIndex app retrieve from VideoDB collections and videos, return timestamped nodes, synthesize text answers, and then turn the same retrieved moments into playable clips.

> "Video RAG should not end at a generated sentence. It should take you to the exact moment the answer came from."

## Why video breaks normal RAG

A video is not a long document. It is a timeline of speech, visuals, motion, and context.

- **Speech matters:** a training video may explain the answer verbally while the slide on screen stays static.
- **Visuals matter:** a product demo, sports play, or surveillance moment may be obvious on screen but never mentioned out loud.
- **Time matters:** a retrieved answer is only useful if you can jump to the moment and verify it.

LlamaIndex orchestrates retrieval and response synthesis. VideoDB stores, indexes, searches, and streams the video evidence.

## The architecture

VideoDB makes video look like a retrieval source without flattening it into text-only context. Spoken words and visual scenes become searchable nodes, each with metadata such as `video_id`, `start`, and `end`. LlamaIndex uses those nodes as retrieved context, while VideoDB uses the timestamps to generate clips.

## Start with the retriever

```bash
pip install videodb llama-index llama-index-retrievers-videodb
```

```python
import os
import videodb

os.environ["VIDEO_DB_API_KEY"] = "YOUR_VIDEO_DB_API_KEY"

conn = videodb.connect()
coll = conn.get_collection()

video = coll.upload(url="https://www.youtube.com/watch?v=LPZh9BOjkQs")
video.index_spoken_words()
```

Now the video can participate in a LlamaIndex retrieval flow through `VideoDBRetriever`.

```python
from llama_index.retrievers.videodb import VideoDBRetriever
from videodb import SearchType, IndexType

spoken_retriever = VideoDBRetriever(
    collection=coll.id,
    video=video.id,
    search_type=SearchType.semantic,
    index_type=IndexType.spoken_word,
    score_threshold=0.1,
)

nodes = spoken_retriever.retrieve("Where does the speaker explain transformers?")
```

## Turn retrieved nodes into an answer

```python
from llama_index.core import get_response_synthesizer

query = "Where does the speaker explain transformers?"

response_synthesizer = get_response_synthesizer()
response = response_synthesizer.synthesize(
    query,
    nodes=nodes,
)

print(response)
```

## Then generate the clip

Every retrieved node includes a start and end time. For a single video, pass those intervals into `video.generate_stream()` and VideoDB creates a playable stream from the relevant moments.

```python
from videodb import play_stream

intervals = [
    (node.node.metadata["start"], node.node.metadata["end"])
    for node in nodes
]

stream_url = video.generate_stream(timeline=intervals)
play_stream(stream_url)
```

## Bring in visual understanding

Speech alone is not enough for video intelligence. VideoDB scene indexing turns visual moments into searchable scene descriptions.

```python
scene_index_id = video.index_scenes(
    prompt=(
        "Describe each scene with objects, actions, text on screen, "
        "and any visual context needed for retrieval."
    )
)

scenes = video.get_scene_index(scene_index_id)
```

Retrieve from that scene index with the same retriever interface:

```python
scene_retriever = VideoDBRetriever(
    collection=coll.id,
    video=video.id,
    search_type=SearchType.semantic,
    index_type=IndexType.scene,
    scene_index_id=scene_index_id,
    score_threshold=0.1,
)

scene_nodes = scene_retriever.retrieve("Show the part with a matrix or formula on screen")
```

## Multimodal RAG in practice

A practical multimodal flow retrieves from both indexes, combines the nodes, and lets LlamaIndex synthesize over the union.

```python
query = "Explain the section where the speaker discusses attention and shows a matrix."

spoken_nodes = spoken_retriever.retrieve(query)
scene_nodes = scene_retriever.retrieve(query)

response = response_synthesizer.synthesize(
    query,
    nodes=spoken_nodes + scene_nodes,
)

print(response)
```

For custom pipelines, fetch transcript and scene records from VideoDB, convert them into LlamaIndex `TextNode` objects, and build a standard `VectorStoreIndex`.

## Collection-level retrieval

VideoDB is not limited to one file. The retriever can target a whole collection. When retrieval spans multiple videos, each node still carries the `video_id`, `start`, and `end` metadata. You can use VideoDB timelines to compile clips from multiple source videos into a single stream.

```python
from videodb.timeline import Timeline
from videodb.asset import VideoAsset

timeline = Timeline(conn)

for node_with_score in spoken_nodes + scene_nodes:
    node = node_with_score.node
    timeline.add_inline(
        VideoAsset(
            asset_id=node.metadata["video_id"],
            start=node.metadata["start"],
            end=node.metadata["end"],
        )
    )

stream_url = timeline.generate_stream()
play_stream(stream_url)
```

## What builders can ship

- Support answers with video proof.
- Training libraries that answer questions.
- Meeting and lecture memory.
- Visual search for agent workflows.

[Open the Simple Video RAG notebook in Colab.](https://colab.research.google.com/github/video-db/videodb-cookbook/blob/main/integrations/llama-index/simple_video_rag.ipynb)

---

## Ground your agents in video. Answers, timestamps, and clips in one RAG loop.

Add VideoDB retrieval to LlamaIndex and make video a first-class knowledge source.

CTAs: [Open the notebook](https://colab.research.google.com/github/video-db/videodb-cookbook/blob/main/integrations/llama-index/simple_video_rag.ipynb) · [Read LlamaIndex docs](https://developers.llamaindex.ai/python/framework/integrations/retrievers/videodb_retriever/)
