New York, NY, March 16, 2026 — D-ID, a leader in enterprise-grade AI avatar solutions, today announced the launch of V4 Expressive Visual Agents, a new generation of ultra-high-fidelity digital humans designed for real-time LLM connected conversations and long-form scripted enterprise video content.
Built on a new diffusion-based model and trained with performances obtained from real actors, the V4 Expressive Visual Agent delivers faster generation, low-latency (less than 0.5 seconds) conversation turns, and highly accurate lip sync at up to 4K resolution, enabling expressive and natural interactions that scale reliably across enterprise use cases.
Currently available to 1,500 enterprise customers and millions of subscribers, V4 Avatar is specifically designed for low-latency delivery and is suitable for real-time conversational experiences as well as long-form content such as training modules, explainers, and multilingual educational videos. To date, more than 800,000 visual agents and 300 million non-interactive avatars have been created using the previous D-ID model. At launch, V4 Expressive Visual Agents are available to all D-ID plan users starting at just $5.90 per month, demonstrating the breakthrough cost efficiency of V4 AI models.
Research shows that human-like facial cues improve knowledge transfer, retention, and understanding. As a result, companies are increasingly adopting high-fidelity avatars for onboarding, training, customer engagement, and internal communications, especially when clarity, authenticity, and consistency are important.
V4 Expressive Visual Agents are the first high-quality, expressive avatars that dynamically respond to your chosen emotion, ensuring tone and intent match the underlying message. This allows spoken content to land clearly and confidently, with a natural pace and emphasis. They are designed to act as a visual interface layer for AI systems, enabling real-time, two-way interaction rather than one-way video playback. When the LLM responds, the avatar automatically adjusts its facial expressions and delivery based on context and emotion. So empathy reads as empathetic, urgency reads as urgent, and confidence reads as confidence. This makes both customer-facing and employee-facing agents more natural, reliable, and effective.
V4 Expressive Visual Agents also adds an optional camera layer that enables real-time emotional recognition, feeding non-verbal cues for both LLM responses and expressive delivery such as avatar tone and facial expressions. In addition, V4 Expressive Visual Agent can display interactive UI elements inline during conversations, sharing contextual visuals such as images, charts, videos, and structured interactions such as forms and quizzes enabled through D-ID’s MCP app.
Unlike short video generation tools that are optimized for movie clips that last only a few seconds, V4 Avatar is designed for continuous and consistent output. Businesses can not only generate minutes or hours of video using stable avatar IDs, but also run real-time conversations at scale at a fraction of the price (70x cheaper than Google VEO 3 Fast), making courses, explainers, multilingual training, and repeatable content series much more cost-effective. These savings are even worse when it comes to real-time interactions, where using D-ID costs pennies per chat.
“We’ve come a long way since our first model that delighted the world by turning still images into talking portraits,” said Gil Perry, co-founder and CEO of D-ID. “Today, with V4, we are setting a new benchmark for avatar fidelity and performance while remaining fast enough for real-time conversations and consistent, efficient, and secure enough for enterprise scale. This advancement in avatar technology positions D-ID at the forefront of providing the visual interface layer for the next wave of AI deployments as enterprises seek more natural, human-like interactions.”
After acquiring simpleshow in September 2025, D-ID expanded its enterprise distribution footprint and integrated its AI avatar capabilities into simpleshow’s corporate training and instructional video ecosystem. Since then, D-ID’s ARR has increased by 250%. This reflects increased cross-selling and increased enterprise demand for interactive, AI-driven video.
About D-ID
D-ID is a world leader in generative AI for video and digital humans, enabling frictionless real-time interactions through real-time streaming APIs. Its technology powers lifelike digital presenters, learning companions, and virtual assistants for Fortune 500 companies and mission-driven organizations. In September 2025, D-ID acquired simpleshow, a global pioneer in AI-based explainer video creation. Based in Berlin, simpleshow helps organizations in more than 70 countries simplify complex messages through smart, scalable, human-centric video communications.
