One Shot Audio to Animated Video Generation

Feb 1, 2021·
N Kumar
,
S Goel
,
A Narang
,
B Lall
,
M Hasan
,
P Agarwal
Dipankar Sarkar
Dipankar Sarkar
· 1 min read
Type
Publication
arXiv preprint arXiv:2102.09737

We present a novel approach for generating animated videos from single images using audio as the driving signal. Our method allows for the creation of realistic talking head animations by combining a single source image with an audio input. This work bridges the gap between audio processing and computer animation, offering applications in virtual avatars, content creation, and human-computer interaction.

The system employs deep learning techniques to analyze speech patterns and facial movements, translating audio features into natural-looking animations. Our approach requires only one shot (single image) of the target subject, making it highly practical for real-world applications where multiple images or video data might not be available.

Key contributions:

  • Single-image animation synthesis driven by audio input
  • End-to-end deep learning framework for audio-visual mapping
  • Real-time capable animation generation
  • Preservation of identity and facial features from source image