China’s leading short-video company, Kuaishou Technology, on Tuesday introduced Kling O1, a unified multimodal model that combines video generation, editing, style transformation, inpainting, and post-production into a single system.
Kling O1 accepts text, images, videos, and specific subjects as inputs, allowing a creator to use a simple combination of words and visual elements to guide the AI.
This is powered by a new Multimodal Visual Language (MVL) framework that allows it to fuse a comprehensive spectrum of capabilities — from turning text into video, generating start and end frames, inserting or removing content, changing styles, and extending shots — into one versatile workflow.
Instead of complex manual editing, creators can simply input prompts such as “remove passersby” or “swap the protagonist’s attire” to execute pixel-level semantic reconstruction, bypassing the need for manual masking or keyframing. The company views this as transforming complex post-production editing into a simple, conversational experience.
Solving the consistency problem
A critical pain point in AI video to date has been maintaining consistency, keeping a character’s features, an outfit, or a scene stable across different shots. Kuaishou claims Kling O1 solves this with what it describes as “director-like memory.”
The model retains the identity of main characters and props, ensuring feature stability even with dynamic camera movements or in complex, multi-subject scenes. Creators can upload reference photos or clips, and the model will lock onto those elements, allowing them to mix and match multiple subjects or seamlessly blend them with other images. This is designed to deliver “industrial-grade consistency across all shots.”
Challenging the global giants
The release of Kling O1 is a significant move in the intensifying AI video sector, positioning Kuaishou — a company that competes domestically with ByteDance’s Douyin — against international behemoths like OpenAI Sora, Google Veo, and Runway.
The model’s precise editing capabilities have earned it a noteworthy comparison. Alvaro Cintas-Canto, an assistant professor of AI and cybersecurity at Marymount University, lauded the tool’s versatility on his X account, hailing Kling O1 as the “Nano Banana for AI video.”
Kuaishou’s co-founder and chief executive, Cheng Yixiao, has previously stated that the company’s AI strategy focuses on using the technology to support TV and film content creation. This focus has translated into strong commercial results: Kuaishou’s Kling AI business, which provides premium video tools, reported sales of 300 million yuan (US$42 million) in the third quarter of 2025, according to the South China Morning Post.
Sam Altman’s recent “code red” over keeping ChatGPT ahead of Google and Anthropic underscores how fierce the AI race has become.


