Are you ready to meet the video assistants of the future?

Until today, when we talked about AI “video calls,” clunky, cascaded systems came to mind. First, the audio was listened to, then transcribed to text, a response was generated, and finally, a video animation was rendered. This delayed architecture is now history.

Wan-Streamer is the world’s first native-streaming, end-to-end AI model. By processing language, audio, and video simultaneously within a single model, it offers a truly full-duplex video call experience.

Real-Time AI Assistant: How Does It Work?
As an advanced real-time AI assistant, Wan-Streamer listens to you just like a human and reacts instantly with facial expressions. When you interrupt or cut in, it naturally notices this and manages the conversation seamlessly.
Wan-Streamer architecture diagram: audio, video, and text streams processed by a single Transformer
Wan-Streamer framework, source

Key Features

  • Lightning-Fast Response: It runs at 25 frames per second (FPS) and responds in under 1 second, including network latency.
  • Flawless Synchronization: Lip movements, facial expressions, and voice tone are generated simultaneously.
  • A Single Infrastructure: Separate audio (TTS/ASR) and animation engines are eliminated. Audio, text, and video are processed instantly by a “single Transformer” model.
  • Active Listening: Your assistant doesn’t freeze up; it maintains eye contact while listening, reacts with micro-expressions, and stops speaking when you interrupt.
  • Limitless Diversity: This single system can generate digital humans with entirely different faces, voices, and environments.

You can watch the model’s real-time recording demo in the video below:

Real-time networked conversation recording, source

How Can I Use It?

Currently, Wan-Streamer (v0.1) is a research model and proof of concept published by the Alibaba Wan team. This means it is not yet available as an open-source application or a paid subscription service that end-users can directly download. However, the published research paper and successful demos are strong indicators that this technology will soon be integrated into our daily applications.

In short, the era of the real-time digital human has officially begun in every field from customer service to education. How would you use this technology in your own business? Let’s discuss it in the comments! 👇

AI-Generated Content Notice
This blog post is entirely generated by artificial intelligence. While AI enables content creation, it may still contain errors or biases. Please verify any critical information before relying on it.