Hands-on Workshop: Give your AI Agents Eyes and Ears (Perception Layer 101)

Mar 19, 2026 6:00 PM

Mar 19, 2026 9:00 PM

Shibuya

LLMs gave us reasoning. RAG gave us retrieval. Tool calling gave us action. What’s missing in the modern agent stack is perception: the ability to see, hear, and remember the world as it happens.

This workshop is a practical walkthrough of building a perception layer for agents using VideoDB. You’ll learn how to convert continuous media (screen, mic, camera, RTSP, files) into a structured context your agent can use:

Indexes (searchable understanding)
Events (real-time triggers)
Memory (episodic recall with playable evidence)

We’ll implement the core loop:

Continuous Media → Perception Layer (VideoDB) → Agent (reasoning + action) → Output grounded in evidence

Who should attend:

Engineers building agents that need continuous and temporal awareness (not one-shot screenshots).
Research teams building in physical AI, desktop robots, and wearables.
Product teams building meeting bots, desktop copilots, monitoring/ops, QA/compliance
Founders building multimodal apps where “show me the moment” matters

What You’ll Discover:

What “perception” actually means for agents: continuous, temporal, multi-source, searchable, actionable.
How to support three input modes with one mental model: files, live streams, desktop capture.
How to build searchable memory so your agent can retrieve results with playable evidence, not vibes.
How to move from batch video AI to real-time event streams your agent can react to immediately.

Plus:

A starter template you can reuse: “Index + Events + Memory” as the default perception stack
Networking with builders working on agents + multimodal infra

‍

Learn more and register here.

Hands-on Workshop: Give your AI Agents Eyes and Ears (Perception Layer 101)

​Who should attend:

​What You’ll Discover:

​Plus:

Who should attend:

What You’ll Discover:

Plus: