Will ADR turn into automatic pix replacement?

Jay Rose · January 3, 2018

Fascinating article in today's NYT about neural networks generating still images of faces with no 'uncanny valley'.

But buried in that article is reference to work at University of Washington last Summer... that automatically edits lips to match a different track! Literally puts new words in someone's mouth. On a computer screen, sync looks absolutely realistic. Resolution might not be enough for a big screen... but these things tend to leap forward quickly.

Here's a link just to the UW demo. They took some real Obama speeches, and put them into multiple other Obama faces. Same speech, many different visual deliveries.

The article doesn't mention what could happen when you edit the source speech to say something new. But heck, good dialog editors have always been able to change what someone says, on the track. Now a computer can make the target individual appear on-camera, saying the edited version!

NYTimes full article link.

Constantin · January 3, 2018

Oh, that lends a whole new meaning to fake news. Now they can even provide the picture to the fake soundbite. Scary.

Syoung · January 3, 2018

I'm sure you guys have seen this, too. Adobe's Project Voco

http://www.bbc.com/news/technology-37899902

Jim Feeley · January 3, 2018

Yikes! There's also this automated object/person removal demo from Adobe MAX... i.e., it's somewhere between a SIGGRAPH paper and a product feature.

Brief marketing article (ie- not the tech paper):

Cloak: Remove Unwanted Objects in Video

https://research.adobe.com/cloak-remove-unwanted-objects-in-video/

Six-minute demo that's pretty interesting and a bit disconcerting. #fakeviews

Jay Rose · January 3, 2018

6 hours ago, Syoung said:

I'm sure you guys have seen this, too. Adobe's Project Voco
http://www.bbc.com/news/tecnology-37899902

From the article:

Quote

At a live demo in San Diego on Thursday, Adobe took a digitised recording of a man saying "and I kissed my dogs and my wife" and changed it to say "and I kissed Jordan three times".

I find that almost impossible to believe. There's no /zh/ in the source speech, so how could it create a /d zh/ diphthong for "Jordan"? Where did it get the /th/ for "three"? Or the /ee/?

I'll accept that it could fake the /t/ for "times" by putting a stop at the front the word... but that's an ancient trick.

(Forgive my attempts at phonetic transcription without IPA...)

Jim Feeley · January 4, 2018

Jay, that Adobe project isn't perfect but is kinda cool. Perhaps an answer to your question is answered in the guy's paper, which came out several months after the Adobe demo that freaked out the BBC:

VoCo: Text-based Insertion and Replacement in Audio Narration

[snip]

While high-quality voice synthesizers exist today, the challenge is to synthesize the new word in a voice that matches the rest of the narration. This paper presents a system that can synthesize a new word or short phrase such that it blends seamlessly in the context of the existing narration. Our approach is to use a text to speech synthesizer to say the word in a generic voice, and then use voice conversion to convert it into a voice that matches the narration. Offering a range of degrees of control to the editor, our interface supports fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and even guidance by the editors own voice.

The paper, demo video, etc can be found here, though I gather from Adobe friends things have advanced a bit this year (still not productized, IIRC):

http://gfx.cs.princeton.edu/pubs/Jin_2017_VTI/

Jay Rose · January 4, 2018

Jim,

Thanks. The idea of using a speech generator and then adding prosody with NN makes sense. I figured the BBC quote was just sloppy reporting.

Once that system is fully cooked, however... why bother with ADR at all? Dialog editor can sit with the director and rebuild any line they want.

IronFilm · January 14, 2018

These are starting to remind me of the "Edit Button" software:

On 1/4/2018 at 8:42 AM, Jim Feeley said:

Six-minute demo that's pretty interesting and a bit disconcerting. #fakeviews

At least we now need to never care about if boom is in shot! ;-) We can go as close in as we like for the best sound.

Sign In

Will ADR turn into automatic pix replacement?

Recommended Posts

Jay Rose

Link to comment

Share on other sites

Constantin

Link to comment

Share on other sites

Syoung

Link to comment

Share on other sites

Jim Feeley

Link to comment

Share on other sites

Jay Rose

Link to comment

Share on other sites

Jim Feeley

Link to comment

Share on other sites

Jay Rose

Link to comment

Share on other sites

IronFilm

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Support