Jump to content

Recommended Posts

Posted

I'm curious to hear from anyone with experience with this.  Last time I worked (1.6 years ago) there were no AI plugins.  Let's imagine a ratio between the best of what was painstakingly done by humans before the strikes, and these same humans now, with AI help.  Obvs the AI will save human hours, but can it also do things better than a human, in minutes?  Can AI help to make crap sound become good sound?  

 

If anyone has experience, I'm curious if the AI is more or less effective with different noises.  Things like:

 

Genny rumble

Riverboat engines

Air handling

Machine noise

People noise (talking off-mic, laughter bursts, etc.)

Distant or passing music

Traffic drone 

Street traffic (or one vehicle passing)

Clothing noise

Weird room resonance or reflections

Footfalls

Dish and chair noise

Plastic bag noise

Floorboards creaking

 

Thanks for listening.

 

Dan Izen

  • Izen Ears changed the title to Can AI plugins magically save dialog?
Posted

Great question Dan. I hope the experts here chime in. 

 

There's what seems to me to be a pretty good and ongoing discussion about noise reduction for dialog on the Gearspace post-production subforum. And it's not super long, at least by GS standards. https://gearspace.com/board/post-production-forum/1424740-2024-dialogue-noise-removal.html

 

Also, @Ian Sampson, who created Hush Pro, is sometimes on JWS. He might have some thoughts.

 

 

Posted

You may be getting a bit misled by the generative ai hype these days. Izotope introduced Machine Learning algorithms (aka A.I.) in RX 6 (2017?) in the form of dialogue isolate and derustle.  Maybe other modules too, I can’t remember.    They are constantly getting better, and newer utilities are being introduced regularly, but they are still not magic.   More like incremental improvements. 
That being said, you can do a ton more with difficult audio today than you could 10 years ago. 
One space i think we are only starting to see adoption in, and a lot of room for growth, is the ability to replace dialog without adr.   Generative algorithms are able to recreate voices with less and less training data. 15 min of speech often can be enough to synthesize a voice with high accuracy. That means we may start seeing post be able to replace troubled production audio without having to bring an actor in for adr, and the software can match the tone of the performance. The SAG strikes and deal touched on this capability, so i will stay focused on the technical, and skip the ethical debate for now. 
this ability also pre dates the last 1.5 years, but has become part of the public discussion, and many more tools are publicly available.   I have done some consulting with companies in this space and its pretty cool.  Like everything else it will become another tool in the toolbelt allowing post to choose when to clean up difficult audio, when to auto replace pieces, and when to re-preform (adr). 
 

Posted

Yes, it has advanced quite a but in the last handful of years! 

I never liked RX Dialogue Isolate.

Acon Digital's Extract: Dialogue can do some amazing things that couldn't have been done 5-10 years ago with a common plugin. Used gently, it's amazing. Used maximally, it can be a life-saver. But not everything can be saved.

I also have WAVES Clarity Pro. Slightly different. I don't like the end result as much, but for nasty stuff, it can clean things more than Acon. The result will be very usable if it is gnarly to begin with, but I reach for Acon first.

A couple of months back, I picked up dxRevive Pro. Wow! That can salvage things in an amazing way. It doesn't make me give up on Acon, but it can clean and revive like no other. I was handed a poorly recorded podcast -- with a guest who didn't use headphones, left the speaker on, and had Zoom's dialogue isolate. Ugh. Acon couldn't do much, but dxRevive made it much better (but not "studio quality). I used it on another project that was not so troublesome and it sounded fantastic.

I demo'd CEDAR's VoiceEx. REALLY great. Does things the others can't, faster, but, it is pricey.

I also demo'd Hush. It got rid of Room + noise really well.

I also think that Acon'd DeReverberate is the best in class.

I use all of these as Audiosuite (or, rendered, if you will), can't use them real-time.

All that said, I still use CEDAR DNS One. It still does a fantastic job quickly and sounds good. I do a more gentle pass with Acon or deRevive, then use DNS ONE.

Posted
On 7/30/2024 at 12:22 PM, minister said:

Yes, it has advanced 

Thanks for excellent info!!

 

I would love to hear specifics. When you say gnarly, what exactly does that mean? What about things like cicadas or a massive night symphony of insects?  Or the roar of a genny directly outside?

Posted

Hi. Actually, cicadas and insects are easy prey (Har!) for these tools. By gnarly, I mean standing on a barge or ship, or near a genny, or near a loud industrial machine that is as loud as or louder than the voice. Or super windy. Or distorted. RX DeClip is good, but, I use RX less and less and less and less... These new tools are better.

Other difficult situations are across the room or really reverberant room. DxRevive Pro, Hush and Acon Deverberate are really doing a greta job with these situations now.

Of course, not everything can be salvaged. 

Posted

I've used Davinci Resolve's voice isolation. It works well. I used it on version 18.5 and now in release 19 they have added AI. I haven't done a comparison, but now it will isolate other things. Example, it will let you pick the drums out of a song - amazing. It wasn't as good as the dialog isolation and I wouldn't expect it to be.

 

I used it to isolate dialog from street traffic. We were shooting right next to a busy road. I'll post an example.

 

I've also used for an interior scene where the room tone that was inconsistent. It gave me clean dialog and then I added back the room tone.

 

What it doesn't do is remove reverb. For that, I bought Wave's Clarity Pro (also mentioned by Minister above). It worked ok. I had an indoor boom that got some reverb. The lavs were muffled during a hugging scene. It helped a lot and I considered it a save. But it did not match the other dialog.

Posted

That sounds really good.  The main thing I notice is the accent gets a lot harder to understand ... I'm missing some of the sibilants I think.  But a regular listener wouldn't notice, and would just turn on subtitles and wonder why they are having trouble understanding the dialogue.

Posted

The one person is a Russian that just came to the States a year ago with a heavy accent; she was hard to understand.  The other person is an American doing a Russian accent.

 

It saved me. I thought I would have to ADR the whole scene. When the cars got right next to them, and I mean within 6' of them and the levels were up above -5db, I can still recover the lines, but there are artifacts.

 

 

Posted

I love RX, I confirm that it does a very good job! I listened to this example and, in my opinion, it's a very simple example.
Much worse things can be smoothed out very beautifully!
My rule is that I try many different tools and always find something suitable. And rule number two is that what works for one dialogue definitely won't work for another dialogue/situation.
You have to try different tools. My favorite is still RX, but Waves tools are also good, as are tools from other creators. The best is a recording that requires minimal processing.
Unfortunately, I can't try Hush Pro because I have a Windows machine.
And I do this for fun, so I probably don't have as much experience as many others here. But don't underestimate me for that 😉

Posted

I don’t do a ton of post but based on my experience with these tools I think Supertone Clear takes the cake as my favorite straightforward dialogue cleanup plugin. It can do some pretty impressive stuff that would otherwise take several RX plugins to pull off.
 

@Izen Ears Clear is available as a free trial, you might enjoy running some production sound recordings from noisy/challenging environments through it just to test the limits of the current technology. 
 

I will have to check out the Acon stuff for sure.

Posted
On 8/2/2024 at 12:57 AM, Paul F said:

I've used Davinci Resolve's voice isolation. It works well. I used it on version 18.5 and now in release 19 they have added AI. I haven't done a comparison, but now it will isolate other things. Example, it will let you pick the drums out of a song - amazing. It wasn't as good as the dialog isolation and I wouldn't expect it to be.

 

I used it to isolate dialog from street traffic. We were shooting right next to a busy road. I'll post an example.

 

I've also used for an interior scene where the room tone that was inconsistent. It gave me clean dialog and then I added back the room tone.

 

What it doesn't do is remove reverb. For that, I bought Wave's Clarity Pro (also mentioned by Minister above). It worked ok. I had an indoor boom that got some reverb. The lavs were muffled during a hugging scene. It helped a lot and I considered it a save. But it did not match the other dialog.

 

Just two weeks ago a friend of mine who's been editing a scene showed me this new tool in Resolve and I was completely blown away. He had to remove some really nasty wind noises on a lav mic that were more than just rumble, some hits went up to 1500 Hz like you would expect from phone recordings in windy situations (don't have an audio example rn). The voice isolate tool erased all of it and recovered the voice with almost zero artifacts. I had to listen real hard to hear some limiter-like effect where the wind was hitting before but nothing you would notice in the context of the szene. I tried to re-create the same restoration in RX9 with De-Wind/Dialogue Isolate/... but the results were not even close.

 

So yeah, fancy stuff going on with AI and I believe there will be more and more solutions to recordings that wouldn't have seem fixable two years ago. Not looking forward to editors just wiping out any ambience and life from the location sound because they can...

Posted

It changed things for me.  Instead of booking a room to record a line, I dropped by the actor's apartment and we stood outside with all the street noise and recorded it. I needed to be outside to avoid the reverb. I stripped off the ambiance and had a clean line. Easy peasy.

Posted
On 7/27/2024 at 2:33 AM, Wandering Ear said:

Generative algorithms are able to recreate voices with less and less training data. 15 min of speech often can be enough to synthesize a voice with high accuracy. That means we may start seeing post be able to replace troubled production audio without having to bring an actor in for adr, and the software can match the tone of the performance


For me this is the really crucial bit and the part I find most interesting in this discussion (for me as a non-post person). There was another brief discussion about generative AI here and I believe if it all goes as it unfolds already now, then this may well be our downfall within the next few (10) years. 
Over here in Germany voice-over artists (who also dub German versions of international language movies) already see a big downturn in work. So far, this happens exactly as I predicted and that is not good. For once I really want to be wrong

Posted

Yes, I think the worst case scenario is that generative AI dialog tools that enable “synthetic ADR” in post becomes so good and easy that it dissolves the importance of or even the need for our craft. If in 5-10 years a zoom in the room getting a scratch track is enough to guide an AI model to recreate studio quality vocals then we’re in big trouble. We should probably all be voicing this concern to our union reps. I would also hope that the actor’s guilds would be concerned about this. I’ll admit I don’t know the details of what AI protections the actors received after the strike but I suspect the language is mostly around visual likeness and sound is still a grey area. 

Posted

Personally I don’t think our craft is going away. While I’m sure plenty of the corporate communications work will be replaced, and there may be less work in some sectors of the industry, I can’t see a world where we stop recording actors original performances. 
There is a very large gap between creating a realistic sounding voice, and being able to control inflection, craft emotion, or “preform”. Current generative models can’t do this.  You can train different “styles” to pull from different training data, but the generative output is still trying to recreate the input under the hood. So a crappy zoom recording won’t magically make a studio caliber ADR (garbage in, garbage out).  But a good on set recording could easily allow replacing a troubled take or location without the extra expense and time of ADR. To me this is the real value of this tech, and also why i think we’ll still have a job to do. 

Like we see with self driving cars, getting the first 80% of functionality is very doable. Getting the next 10% is insanely hard, and getting the last 10% may not even be possible. 
That’s just my take on it. 

Posted

@Wandering Ear That's my take as well.  I can't actually see it being cheaper and more effective to recreate it in post.  You don't get actor chemistry in ADR.  You are limited to matching the lip-flap of the on-set performance (changeable with video editing, but not simple).  The actors and director have to come back for additional days.  Given the way actors are paid, a few days of ADR could cost as much as having the sound department on set for the shoot.  Yes, you can synthesize a voice, but actors are paid for their performance, and I can't see directors and actors wanting to delegate the performance to a "cheap actor" as a matter of course.  Some directors might prefer synthetic ADR for workflow, but they'll be in the minority, and those directors will get pushback from actors.  Some producers might insist on ADR workflow for perceived cost-savings — and will learn that whatever they gain in cost, they lose in performance.

The value we provide on set is far more than making a technical recording of "what is there".  Our role is to be listening and making sure that there is raw audio material to work with in post:  We are paid to listen as much as we are to record.  And we are paid to anticipate problems that might come up that could prevent a good recording from being made.  It still needs to be someone's responsibility to set up a "zoom-in-a-room" recorder and make sure it is recording to whatever specs generative AI needs.  As soon as that person is necessary, it makes sense for that person to be a constant set of ears that is doing what we do — i.e. it makes sense to have a sound dept.
 

Posted

Yeah I hope so I just think we need to be vigilant in the coming years. I could also see it becoming harder to deal with set and creating the environment and crew discipline for good sound recording if people in every other department are under the impression that it’s simple to fix the sound in post. 
 

And just to be clear I think when we say generative AI sound processing we’re not talking about noise reduction or traditional ADR but the kind of thing that can input a low quality recording, maybe a distant mic or unusably noisy and actually generate sound that wasn’t in the recording to begin with such as adding all the missing vocal frequencies and fullness to the dialog on a distant mic making it sound like it was recorded well. The performance would be good because it’s the original performance being processed in this scenario. 

Posted
6 hours ago, Derek H said:

 

And just to be clear I think when we say generative AI sound processing we’re not talking about noise reduction or traditional ADR but the kind of thing that can input a low quality recording, maybe a distant mic or unusably noisy and actually generate sound that wasn’t in the recording to begin with such as adding all the missing vocal frequencies and fullness to the dialog on a distant mic making it sound like it was recorded well. The performance would be good because it’s the original performance being processed in this scenario. 


Does this exist?  Can you share any references or software that can do this?  Would be a pretty cool software. 
 

The generative voice models i am most familiar with are text to speech, and synthesize a new waveform that represents the text input.  These can be trained with specific voices, allowing the synthetic adr. 
The ML based noise reduction algorithms like dialog isolate extract the voice from the background noise using essentially pattern recognition. 
 

One more thing i think that could be a benefit of this technology is synthetic adr that matches the production dialog better than some of the adr we hear today 

Posted

I would think this would be an application of AI applied to a vocoder, aka deep fake tech.  I'm not intimately familiar with it, but I believe the idea is you imprint someone's voice print onto an existing performance so it sounds like someone else speaking.  You can already use this to imprint the sound of a particular space or a particular microphone; it's how reverb matching works.

I would imagine the main obstacle right now is the ability to remove reverb, which does seem to be gradually improving with AI help.  Once you do that, you can transform the performance in all sorts of ways.

And, yes, all of this is me imagining based on things I've heard about ... like you I'm hoping someone can step in with practical experience with the state of the art in this field.

Posted
11 hours ago, Wandering Ear said:

Does this exist?  Can you share any references or software that can do this?  Would be a pretty cool software. 


Well currently there’s “enhance speech” from Adobe. Which claims it doesn’t synthesize new audio but didn’t seem like that big of a leap. Mostly for amateur video and podcasters. For now. 
 

https://helpx.adobe.com/premiere-pro/using/enhance-speech-faq.html

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...