Jump to content

Will ADR turn into automatic pix replacement?


Jay Rose

Recommended Posts

Fascinating article in today's NYT about neural networks generating still images of faces with no 'uncanny valley'. 

 

But buried in that article is reference to work at University of Washington last Summer... that automatically edits lips to match a different track! Literally puts new words in someone's mouth. On a computer screen, sync looks absolutely realistic. Resolution might not be enough for a big screen... but these things tend to leap forward quickly.

Here's a link just to the UW demo. They took some real Obama speeches, and put them into multiple other Obama faces. Same speech, many different visual deliveries.

 

The article doesn't mention what could happen when you edit the source speech to say something new. But heck, good dialog editors have always been able to change what someone says, on the track. Now a computer can make the target individual appear on-camera, saying the edited version!

 

NYTimes full article link.

Link to comment
Share on other sites

Yikes! There's also this automated object/person removal demo from Adobe MAX... i.e., it's somewhere between a SIGGRAPH paper and a product feature.

 

Brief marketing article (ie- not the tech paper):

Cloak: Remove Unwanted Objects in Video

https://research.adobe.com/cloak-remove-unwanted-objects-in-video/

 

Six-minute demo that's pretty interesting and a bit disconcerting. #fakeviews

 

 

Link to comment
Share on other sites

6 hours ago, Syoung said:

I'm sure you guys have seen this, too. Adobe's Project Voco
http://www.bbc.com/news/tecnology-37899902

 

From the article:

Quote

At a live demo in San Diego on Thursday, Adobe took a digitised recording of a man saying "and I kissed my dogs and my wife" and changed it to say "and I kissed Jordan three times".

 

I find that almost impossible to believe. There's no /zh/ in the source speech, so how could it create a /d zh/ diphthong for "Jordan"? Where did it get the /th/ for "three"? Or the /ee/?

 

I'll accept that it could fake the /t/ for "times" by putting a stop at the front the word... but that's an ancient trick.

 

 

 

(Forgive my attempts at phonetic transcription without IPA...)

 

Link to comment
Share on other sites

Jay, that Adobe project isn't perfect but is kinda cool. Perhaps an answer to your question is answered in the guy's paper, which came out several months after the Adobe demo that freaked out the BBC:

 

VoCo: Text-based Insertion and Replacement in Audio Narration

[snip]

While high-quality voice synthesizers exist today, the challenge is to synthesize the new word in a voice that matches the rest of the narration. This paper presents a system that can synthesize a new word or short phrase such that it blends seamlessly in the context of the existing narration. Our approach is to use a text to speech synthesizer to say the word in a generic voice, and then use voice conversion to convert it into a voice that matches the narration. Offering a range of degrees of control to the editor, our interface supports fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and even guidance by the editors own voice.

 

The paper, demo video, etc can be found here, though I gather from Adobe friends things have advanced a bit this year (still not productized, IIRC):

http://gfx.cs.princeton.edu/pubs/Jin_2017_VTI/

 

 

Link to comment
Share on other sites

Jim,

Thanks. The idea of using a speech generator and then adding prosody with NN makes sense. I figured the BBC quote was just sloppy reporting. 

Once that system is fully cooked, however... why bother with ADR at all? Dialog editor can sit with the director and rebuild any line they want.

Link to comment
Share on other sites

  • 2 weeks later...

These are starting to remind me of the "Edit Button" software:
 

 

 

 

On 1/4/2018 at 8:42 AM, Jim Feeley said:

Six-minute demo that's pretty interesting and a bit disconcerting. #fakeviews

 


At least we now need to never care about if boom is in shot! ;-) We can go as close in as we like for the best sound. 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...