One Shot Audio to Animated Video Generation

Feb 1, 2021·

N Kumar

S Goel

A Narang

B Lall

M Hasan

P Agarwal

Dipankar Sarkar

· 1 min read

PDF

Type

Preprint

Publication

arXiv preprint arXiv:2102.09737

We present a novel approach for generating animated videos from single images using audio as the driving signal. Our method allows for the creation of realistic talking head animations by combining a single source image with an audio input. This work bridges the gap between audio processing and computer animation, offering applications in virtual avatars, content creation, and human-computer interaction.

The system employs deep learning techniques to analyze speech patterns and facial movements, translating audio features into natural-looking animations. Our approach requires only one shot (single image) of the target subject, making it highly practical for real-world applications where multiple images or video data might not be available.

Key contributions:

Single-image animation synthesis driven by audio input
End-to-end deep learning framework for audio-visual mapping
Real-time capable animation generation
Preservation of identity and facial features from source image

Last updated on Dec 10, 2024