Imagine software that can watch hundreds of live video streams at once, understand what’s happening, and summarize it all in seconds. That’s now a reality with NVIDIA’s new AI Blueprint for Video Search and Summarization (VSS), a powerful addition to its growing arsenal of developer tools. Capable of digesting an hour of footage into a one-minute summary, this tool is poised to transform industries from manufacturing to employee training, where every second and every detail counts.
AI agents can sit alongside video meetings or in factories
AI Blueprint for VSS gives developers a framework for analyzing both archived and live videos. With the tool, they can build AI agents to search, summarize, transcribe audio, or extract specific information from related videos. An early customer, electronics manufacturing firm Pegatron said AI agents built with AI Blueprint for VSS reduced labor costs by 7% and defect rates by 67%.
For manufacturing, VSS enables object detection and tracking. It could also monitor traffic in smart cities. The speech-to-text capabilities could be deployed during team meetings, keynote speeches, or new employee training. For example, the AI can analyze one employee performing a task and then break that task down into steps to show a new trainee.
How AI Blueprint for VSS works
Behind AI Blueprint for VSS is the NVIDIA Metropolis, a developer platform for automating physical processes. The platform uses NVIDIA’s language models, such as the video model VILA and Llama Nemotron, to quickly summarize vast quantities of video data. It is connected to the customer’s enterprise data through NeMo Retriever microservices, first made available in 2023.
The enterprise data digested by AI Blueprint for VSS is processed through retrieval-augmented generation, which checks the AI output against real-world data to reduce hallucinations.
The current release offers expanded hardware support for:
- NVIDIA A100
- NVIDIA H100 GPU
- NVIDIA RTX 6000 PRO
- NVIDIA DGX Spark
It can be deployed via the NVIDIA API Catalog, Launchables, Docker, Helm charts, or the cloud.