Jump to content
Sign in to follow this  
Jay Rose

Will ADR turn into automatic pix replacement?

Recommended Posts

Fascinating article in today's NYT about neural networks generating still images of faces with no 'uncanny valley'. 

 

But buried in that article is reference to work at University of Washington last Summer... that automatically edits lips to match a different track! Literally puts new words in someone's mouth. On a computer screen, sync looks absolutely realistic. Resolution might not be enough for a big screen... but these things tend to leap forward quickly.

Here's a link just to the UW demo. They took some real Obama speeches, and put them into multiple other Obama faces. Same speech, many different visual deliveries.

 

The article doesn't mention what could happen when you edit the source speech to say something new. But heck, good dialog editors have always been able to change what someone says, on the track. Now a computer can make the target individual appear on-camera, saying the edited version!

 

NYTimes full article link.

Share this post


Link to post
Share on other sites

Yikes! There's also this automated object/person removal demo from Adobe MAX... i.e., it's somewhere between a SIGGRAPH paper and a product feature.

 

Brief marketing article (ie- not the tech paper):

Cloak: Remove Unwanted Objects in Video

https://research.adobe.com/cloak-remove-unwanted-objects-in-video/

 

Six-minute demo that's pretty interesting and a bit disconcerting. #fakeviews

 

 

Share this post


Link to post
Share on other sites
6 hours ago, Syoung said:

I'm sure you guys have seen this, too. Adobe's Project Voco
http://www.bbc.com/news/tecnology-37899902

 

From the article:

Quote

At a live demo in San Diego on Thursday, Adobe took a digitised recording of a man saying "and I kissed my dogs and my wife" and changed it to say "and I kissed Jordan three times".

 

I find that almost impossible to believe. There's no /zh/ in the source speech, so how could it create a /d zh/ diphthong for "Jordan"? Where did it get the /th/ for "three"? Or the /ee/?

 

I'll accept that it could fake the /t/ for "times" by putting a stop at the front the word... but that's an ancient trick.

 

 

 

(Forgive my attempts at phonetic transcription without IPA...)

 

Share this post


Link to post
Share on other sites

Jay, that Adobe project isn't perfect but is kinda cool. Perhaps an answer to your question is answered in the guy's paper, which came out several months after the Adobe demo that freaked out the BBC:

 

VoCo: Text-based Insertion and Replacement in Audio Narration

[snip]

While high-quality voice synthesizers exist today, the challenge is to synthesize the new word in a voice that matches the rest of the narration. This paper presents a system that can synthesize a new word or short phrase such that it blends seamlessly in the context of the existing narration. Our approach is to use a text to speech synthesizer to say the word in a generic voice, and then use voice conversion to convert it into a voice that matches the narration. Offering a range of degrees of control to the editor, our interface supports fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and even guidance by the editors own voice.

 

The paper, demo video, etc can be found here, though I gather from Adobe friends things have advanced a bit this year (still not productized, IIRC):

http://gfx.cs.princeton.edu/pubs/Jin_2017_VTI/

 

 

Share this post


Link to post
Share on other sites

Jim,

Thanks. The idea of using a speech generator and then adding prosody with NN makes sense. I figured the BBC quote was just sloppy reporting. 

Once that system is fully cooked, however... why bother with ADR at all? Dialog editor can sit with the director and rebuild any line they want.

Share this post


Link to post
Share on other sites

These are starting to remind me of the "Edit Button" software:
 

 

 

 

On 1/4/2018 at 8:42 AM, Jim Feeley said:

Six-minute demo that's pretty interesting and a bit disconcerting. #fakeviews

 


At least we now need to never care about if boom is in shot! ;-) We can go as close in as we like for the best sound. 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

×