Jump to content

Amazing Nvidia RTX Voice AI Noise reduction


Recommended Posts

Has anybody seen these demos of the NVidia RTX Voice AI noise reduction App?

It seems to be a game changer for eliminating all types of background noise from a Microphone input.

Sorry all you Mac fanatics, it currently needs an NVidia RTX video card and Windows 10 to work.

It is in beta and free right now but check out these video demos.

 

 

 

 

 

Link to post
Share on other sites

Once the app supports a DAW rather than just computer mic and virtual speakers this could be amazing. It might make a voice too sterile. Right now looks like it is dealing with live computer mic only.

 

I upgraded to a GeForce RTX2080 last year for my main edit suite (ASUS main board). As a result it gave my four-year old self-built computer new life! Much faster renders. No more PremierePro crashes when exporting complex timelines with multiple layers of BCC, green screen and speed changes.

 

Nvidia, PLEASE make this work for eliminating clothing and beard rustle from recorded tracks.

Link to post
Share on other sites
3 hours ago, PMC said:

Nvidia, PLEASE make this work for eliminating clothing and beard rustle from recorded tracks.


yes that would be nice. But it will also be another nail in the coffin of a an experienced sound mixer - on set and in post. If I can get rid of all background noise, I can also place the lav fairly haphazardly somewhere, and I don’t need to eliminate problematic noise on set ir in post anymore... We‘re not there yet, but well on the road to removing the need for skilled and experienced sound workers. 
and in another two years max, ADR with an actor present will be a thing of the past. AI will take care of it. And then in two years after that...?

Link to post
Share on other sites
11 minutes ago, Constantin said:


yes that would be nice. But it will also be another nail in the coffin of a an experienced sound mixer - on set and in post. If I can get rid of all background noise, I can also place the lav fairly haphazardly somewhere, and I don’t need to eliminate problematic noise on set ir in post anymore... We‘re not there yet, but well on the road to removing the need for skilled and experienced sound workers. 
and in another two years max, ADR with an actor present will be a thing of the past. AI will take care of it. And then in two years after that...?

innovation, what a b*tch...

Link to post
Share on other sites

Chances are this will never make it to set or even post, due to several limitations, one being we often require much higher sound quality than these gaming services do. I mean the bandwidth of these streaming services are under 64kbit, if even that. Imagine being in post having a line that sounds fine and this is applied, it still sounds good and clean, but there's artifacts that make it just that little bit unusable. 

I still think it's faster to adr or find another line and edit that in. 

I really wish though that this will be great and can reproduce high quality sound. That'd be awesome.. But I'm not so sure. 

Link to post
Share on other sites
2 hours ago, Olle Sjostrom said:

Chances are this will never make it to set or even post, due to several limitations, one being we often require much higher sound quality than these gaming services do. I mean the bandwidth of these streaming services are under 64kbit, if even that. Imagine being in post having a line that sounds fine and this is applied, it still sounds good and clean, but there's artifacts that make it just that little bit unusable. 

I still think it's faster to adr or find another line and edit that in. 

I really wish though that this will be great and can reproduce high quality sound. That'd be awesome.. But I'm not so sure. 

 

Well I think what is doing is amazing and the quality of the recorded demos sound quite acceptable when you consider the INTENSE background noise like Sirens and Vacuum cleaners and even shouting children within a few feet of the mic.. And it is doing it in REAL TIME only with only a few milliseconds of delay.Imagine what a post workstation plug-in could do with 2 AI passes in non real time.

Link to post
Share on other sites

Yeah I love to be proven wrong! I tend to take these demos with a grain of salt. But again, even if it does sound acceptable, the bitrates are low. In a theater, you would probably hear all kinds of things. Those mics are very close to the mouth and are probably designed for specific application and position, whereas our mics are made to sound flat and reproduce natural sounds. 

 

And that vacuum I could hear in his speech. And there were audible artifacts from the RTX. That Krisp thing sounded better to my ears, but it still wasn't good enough for the cinema.. So yes, impressive, but it works because the sound quality is so low. It's made to make voices over gaming sessions clearer, which means that you as a player are listening to the games audio and up to maybe 32 people talking at the same time so I reckon removing noise from those voices world change a whole lot in that context. 

 

But I still think we're far off integrating this into our profession. But again, I love to be proven wrong!

Link to post
Share on other sites
23 minutes ago, Olle Sjostrom said:

But I still think we're far off integrating this into our profession. But again, I love to be proven wrong!


I wouldn’t... 

But anyway, don’t worry it’ll be here soon enough. The technology is already much further advanced than what NVidia is showing here. It won’t be long before AI is going to take over all of our dialogue cleaning and ADR needs. Because it can and because people will want it. 
 

 

9 hours ago, Sound Development said:

I don't understand why this would mean sound professionals on set are going to slack.


Not this, but this is also just the beginning. I didn’t imply (or didn’t intend to) that the sound professionals are going to slack. It means productions will. Wait for a plane or try to eliminate a vacuum cleaner? No sorry, no time and we can do it easily in post with no ADR.

The person haphazardly placing the lav won’t be a sound mixer. It’ll be a pa. Or thanks to coronavirus it‘ll be the talent themselves. The sound mixer won’t have a job by then. 
Or maybe I‘ve been watching too many Terminator movies...

9 hours ago, Vincent R. said:

innovation, what a b*tch...


yup. But we just have to be quicker and adapt

Link to post
Share on other sites

I'm having a hard time seeing innovation radically changing our workflow.. If you think about it, technology in post and recording has been developing rapidly, all these new RF systems and their remote capabilities and so on. High dynamic range.. 

Noise reduction is nothing new # this is just new algorithms, algorithms that see based on this application. AI is great, but it still has to be applied specifically to an application. There's no AI that can hear that this is a DPA4060 sitting too far away and has some bad rustling and knows exactly what to do about it. Then the software would have to recreate the voice of that actor, and then you'd have another discussion about if we really need actors.

 

I wouldn't worry. Computers still can't update thermselves and make sure that all the other programs are compatible. And don't get me started on external hard drives and if you change letters or catalogs... Nah. Computers are still stupid, because we are (thankfully) still stupid.

Link to post
Share on other sites
1 hour ago, Olle Sjostrom said:

I wouldn't worry. 


I don’t worry. But I don’t want to be unprepared. 
 

1 hour ago, Olle Sjostrom said:

Computers are still stupid, because we are (thankfully) still stupid.


Still, yes. But we‘ve successfully taught the machines how to learn, how to analyze and compare thousands of recordings in a matter of minutes. We‘ve taught them how to compose music and write books. And how to emulate and recreate voices perfectly. Deep fakes are a thing already and they’re already very good. Imagine what they can do in two years let alone four. 

Link to post
Share on other sites

I have been using it for the past week. Its... okay. It does not like omni-directional microphones. I have tested it with lavs and headset mics. Because of the nature of their pickup, someone standing next to you comes in too clearly to be "recognized" as noise and you will hear them on the mic still. Especially if the miked person is not talking. So the app could use a SNR threshold slider so you can dial in that setting better.

 

It still requires good SNR in regards to your subject and background noises. It has a VERY unnatural silence to it which is fine if you are doing Twitch streaming and you have your gameplay as an audio bed underneath your voice (which is what it was designed for) but it sounds weird as a plugin on a podcast. It reminds me of the bad post processing they do on Bachelor in Paradise when the couples will walk along the sand and you hear the ocean waves go mute when no one is talking but magicly a loud ocean appears out of no where when someone talks. It's that bad.'

 

RTX Voice is a fine tool... but not ready for Primetime TV or feature films... maybe decent for sports reporters using a directional dynamic mic or dynamic headset mics in a loud sports arena. I'd only "trust it" to fix my audio live or in post in those cases.
 

Link to post
Share on other sites
9 hours ago, Constantin said:

 Deep fakes are a thing already and they’re already very good. Imagine what they can do in two years let alone four. 

Yes well, then again I think we should be more concerned about the actors. 

Link to post
Share on other sites
20 hours ago, Olle Sjostrom said:

I wouldn't worry. Computers still can't update thermselves and make sure that all the other programs are compatible. And don't get me started on external hard drives and if you change letters or catalogs... Nah. Computers are still stupid, because we are (thankfully) still stupid.

Have you been living in a cave?  Computers have been updating themselves and even checking programs installed before doing so for the last 5 years.

All Macs and Windows 10 computers connected to the internet automatically apply updates every week or month.  As do most of your connected appliances.

18 hours ago, Andrew From Deity said:

It still requires good SNR in regards to your subject and background noises. It has a VERY unnatural silence to it which is fine if you are doing Twitch streaming and you have your gameplay as an audio bed underneath your voice (which is what it was designed for) but it sounds weird as a plugin on a podcast. It reminds me of the bad post processing they do on Bachelor in Paradise when the couples will walk along the sand and you hear the ocean waves go mute when no one is talking but magicly a loud ocean appears out of no where when someone talks. It's that bad.'

 

RTX Voice is a fine tool... but not ready for Primetime TV or feature films... maybe decent for sports reporters using a directional dynamic mic or dynamic headset mics in a loud sports arena. I'd only "trust it" to fix my audio live or in post in those cases.
 

Andrew, you seem to be assuming that Primetime TV and feature films do not have talented sound effects and dialogue editors working on them.  Almost every commercial TV production or Feature production has dialog editors that clip out or noise-gate all the inappropriate background noise between lines and replace it with a consistent BG track or ambiance track that covers the edits in the scene.  This tool will be useful for doing what they previously had to do with ADR and that is removing the annoying background noise from behind the actors lines while they are speaking.  This should allow the laid in consistent background effects to be less obvious.

Link to post
Share on other sites
43 minutes ago, cmgoodin said:

This tool will be useful for doing what they previously had to do with ADR

 

This exactly.

Me and most directors I work with prefer the performance of the talent on set almost always to the recorded ADR. Most care more about performance then technical audio quality (some should care more about the quality :S ).

That's why in mixes the technically faulty lines tend to get used anyway, washing it with music or car doors or whatever.

I applaud any new innovation that helps reduce ADR and makes set performances usable, if not technically perfect.

Then again, Ill never reveal this tool to producers... To them Im still editing analogue tape :D

Link to post
Share on other sites

This all seems really cool. Turns out, we can mess with the RTX Voice plug-in on some older Nvidia cards. Well, at least on a GTX 1060. From the reliable (ie- not spammy) site Ars Technica:

 

You can get Nvidia’s “RTX Voice” noise filtering without a pricey RTX card 

Ars testing (and install hack) shows noise-cancellation functioning on GTX 1060.

 

Also, I'm not sure what Nvidia is doing, but since they want to sell GPUs, perhaps Nvidia will license/partner with higher-end developers for our little niche. Like they do with Redshift, Pixar, Foundry, etc for visual noise-reduction stuff. 

 

This is from a couple years ago this article on Nvidia's developer blog focuses more on noise issues for mobile phones, but is still pretty interesting:

Real-Time Noise Suppression Using Deep Learning

 

As is this higher-level article from iZotope a few years ago:

What the Machine Learning in RX 6 Advanced Means for the Future of Audio Repair Technology

This line sticks out: "One area that could be exciting is if an algorithm could decide when a piece of audio is just too corrupted to repair without objectionable artifacts and in those cases actually synthesize replacement speech and/or music."

Link to post
Share on other sites
3 hours ago, cmgoodin said:

Have you been living in a cave?  Computers have been updating themselves and even checking programs installed before doing so for the last 5 years.

All Macs and Windows 10 computers connected to the internet automatically apply updates every week or month.  As do most of your connected appliances.

Andrew, you seem to be assuming that Primetime TV and feature films do not have talented sound effects and dialogue editors working on them.  Almost every commercial TV production or Feature production has dialog editors that clip out or noise-gate all the inappropriate background noise between lines and replace it with a consistent BG track or ambiance track that covers the edits in the scene.  This tool will be useful for doing what they previously had to do with ADR and that is removing the annoying background noise from behind the actors lines while they are speaking.  This should allow the laid in consistent background effects to be less obvious.

Oh I consider myself very tech aware and quite savvy. I work with computers every day and I still think I'm right when I say that computers, especially windows, can't update themselves and insure stability with all other softwares, licenses and peripherals. For instance, security settings may change from one OS update to the other, making mission critical apps or web pages not function correctly with the new update which makes the whole system moot and has to manually be downgraded to much dismay and lost work hours. This is happening now than twice each year in my workplace. We use windows 10, albeit a corporate version where we have people working exclusively with MS updates. That said, yes, computers can update themselves but they are still not self sufficient in all aspects and therefore is not yet a threat to us technicians. It's luckily a kind of a built in failsafe that we still need to supply power and the right codes for these machines. As long as the code itself can't code itself, we should be OK. But again, I can be wrong. And I'm happy to say I am!

 

I am also aware that people in post use tools all the time, but this particular tool, to my eyes, is not a new tool in the sense that we as mixers or technicians will be replaced by the tool anytime soon. 

 

And I'm saying again that I think for this specific application, the sound and result is amazing. But for high end production sound, I think this developer has a long way to go. Nvidia will probably not venture into the world of pro audio with this. They make a lot more money selling graphic cards to kids than a few studios.. 

Link to post
Share on other sites
57 minutes ago, Olle Sjostrom said:

As long as the code itself can't code itself, we should be OK.

It's here already (deep learning programming); Bayou, Sketchadapt and AutoML for example. The latter, by Google, is a bit older kid on the block.

 

Link to post
Share on other sites
1 hour ago, Olle Sjostrom said:

O

And I'm saying again that I think for this specific application, the sound and result is amazing. But for high end production sound, I think this developer has a long way to go. Nvidia will probably not venture into the world of pro audio with this. They make a lot more money selling graphic cards to kids than a few studios.. 

 

This particular software tool from Nvidia is just a reference example Beta to show what it can do.  The API will be available for other software developers to use the technology for their specific software tools.   This is only the bleeding edge of this technology.  I'm sure we will see more sophisticated tools in the near future.

 

Link to post
Share on other sites
19 hours ago, Olle Sjostrom said:

Let's hope! New tools are fun! But I still don't think it's gonna replace technicians.

I don't think it is going to replace technicians.   They still need a sound mixer on set to handle 

the recording and management of files and mics and making sure the dialog is recorded.     But look at it as a great boon.

It may finally be the thing that keeps the Production Mixer from being the least respected person on the set.

Just think you will no longer be reviled for having to say "Lets hold for the plane"  or "I'm sorry those stilleto heels

are ruining the track you will have to take them off".   Or "Sorry there was some clothing noise on that line we have to do another one"  or 

"I'm sorry we have to move craft service off the stage I can hear the refrigerator and coffee machine". or "I'm sorry the off-camera lines overlaped 

the star's close up".  or "Is there anything we can wrap around the camera I can still hear the fan".  

So we can continue on to the next take with the confidence that all these things can be easily fixed in post or even in processing for dailies making us sound like the new hero's on the set.

 

Link to post
Share on other sites

Just waiting here for the new Sound Devices recorder with a PCIE slot. All water cooled with some adjustable RGB lights. Up-gradable RAM and all...

 

Link to post
Share on other sites
3 hours ago, cmgoodin said:

I don't think it is going to replace technicians.   They still need a sound mixer on set to handle 

the recording and management of files and mics and making sure the dialog is recorded.     But look at it as a great boon.

It may finally be the thing that keeps the Production Mixer from being the least respected person on the set.

Just think you will no longer be reviled for having to say "Lets hold for the plane"  or "I'm sorry those stilleto heels

are ruining the track you will have to take them off".   Or "Sorry there was some clothing noise on that line we have to do another one"  or 

"I'm sorry we have to move craft service off the stage I can hear the refrigerator and coffee machine". or "I'm sorry the off-camera lines overlaped 

the star's close up".  or "Is there anything we can wrap around the camera I can still hear the fan".  

So we can continue on to the next take with the confidence that all these things can be easily fixed in post or even in processing for dailies making us sound like the new hero's on the set.

 


In other words we won’t need to bother the rest of the crew anymore or be the „only“ department that would hold up an otherwise speedy shoot day, and we can finally stay out of the way, just like „they“ always dreamed we would. 
Here in Germany we have already lost the third sound position on many productions. And I know and have heard of colleagues who are working alone on regular scripted productions. This is going to spread as well. This tech is certainly not going to stop this trend. 

Link to post
Share on other sites
  • 4 months later...
9 hours ago, KenLac said:

I'm so old I can remember when bands weren't going to have drummers anymore. 😏

And synths would take the place of all instruments!   I'm always amused with the glee with which apps or tools like this are announced: the techos saying: "you arrogant movie sound types are DONE!  Ha!".   OK, cool, fine.  So you, operator of whatever new-super-hotshot-technology is being discussed will now also take on all responsibility for getting a workable soundtrack for a scene, where ever and however it is shot, under any conditions, in full viral safety , full annotated and made ready for the harassed and time-crunched posties, for weeks at a time...right?  You thought that NR was the whole job, maybe?

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...