Microsoft's new AI tool creates lifelike talking faces from photos

20/04/2024

Arab Times

20/04/2024

Josephine

The Breakthrough AI tool by Microsoft generates realistic talking faces in real time.

NEW YORK, April 20: Microsoft Research Asia has introduced a groundbreaking experimental AI tool named VASA-1, capable of generating lifelike talking faces in real-time. This innovative technology combines a still image of a person or a drawing with an existing audio file to produce a realistic talking face, complete with facial expressions, head movements, and synchronized lip movements.

The researchers behind VASA-1 have uploaded numerous examples showcasing its capabilities, demonstrating its potential to deceive viewers into believing the generated faces are real. While some nuances in lip and head motions may appear slightly robotic upon close examination, the technology's potential for creating convincing deepfake videos of real individuals is evident.

Acknowledging the potential misuse of the technology, the researchers have opted not to release an online demo, API, or any related offerings until they are confident that it will be used responsibly and in compliance with regulations. However, they have not disclosed whether specific safeguards will be implemented to prevent malicious actors from utilizing VASA-1 for nefarious purposes, such as creating deepfake porn or spreading misinformation.

Despite concerns about misuse, the researchers emphasize the numerous benefits of VASA-1. They believe it can contribute to educational equity and enhance accessibility for individuals with communication challenges by providing them with avatars capable of conveying messages on their behalf. Additionally, the technology has the potential to offer companionship and therapeutic support, suggesting its integration into programs featuring AI characters for interpersonal interaction.

The underlying technology powering VASA-1 is trained on the VoxCeleb2 Dataset, which comprises over 1 million utterances from 6,112 celebrities extracted from YouTube videos. While training on real faces, VASA-1 also demonstrates its versatility by working with artistic photos, as demonstrated by its combination of the Mona Lisa with Anne Hathaway's rendition of Lil Wayne's Paparazzi.

The introduction of VASA-1 underscores the ongoing advancements in AI technology and its potential applications in various domains, despite the need for careful consideration of ethical implications and regulatory frameworks surrounding its deployment.