Jump to content

How will Ai effect production sound recording?


Gregory Albert

Recommended Posts

I just completed revisions on 21 five minute interactive muliimedia training modules for one of my clients. Three years ago I hired four voice artists for the original project. 

 

Re-voicing all the changes plus recording time would have cost around $3k. I did it with a high quality real-time AI voicing website for $50. Plus my editing time. I can't tell you how much of it was actually AI vs HQ voice synthesis. The term AI is often misused.

 

I could change gender, ethnicity, timbre, speed, emphasis, pause length, etc all on the website interface. Then tweak the .wav export in Adobe Audition for timing to video and gfx.

 

There was a learning curve to get the voice synthesis to pronounce some technical terms correctly but not as bad as when I used to use the Votrox SC-01 voice synthesizer when designing games for the Tandy TRS80 and CoCo Computers in the 80's.

Link to comment
Share on other sites

I'd say post is going to use it more than production (at least at first).  I do fear a future where even large shoots will be 1 person sound teams that have you lav up each speaking actor, record everything even if there is clothing rustle etc (just make it "good enough") and then they take all the recordings and do an AI version of ADR, automating the process and discarding the actual audio used on set.

Link to comment
Share on other sites

I remember a Frank Zappa interview from many years ago in which he was asked about using the new Mellotron and synthesizers to emulate an orchestral sound in his work. This was during his fight with local government and Studios who were pushing restrictions on home studios with zoning laws.

 

He said the new tech was creatively freeing and didn't require union breaks.

 

Production and Post workflows will definitely change and actors will strike to restrict the cloning/reproduction of their performances. Oh wait, they are.

Link to comment
Share on other sites

22 hours ago, codyman said:

I do fear a future where even large shoots will be 1 person sound teams that have you lav up each speaking actor, record everything even if there is clothing rustle etc (just make it "good enough") and then they take all the recordings and do an AI version of ADR, automating the process and discarding the actual audio used on set.


I have the exact same fear. In the long run, a mic on cam will suffice and no sound crew will be needed anymore at all. That’s just how things are progressing. It’s also what happened to thousands of other workers in other industries. Sad but true. 
The one thing that might save us is the actors striking to prevent AI from altering any of their performances. 

Link to comment
Share on other sites

6 hours ago, Matthias Richter said:

as long as AI isn’t working in real time right on set there is hope that the Director and other on-set crew need a proper mix to hear the dialog which requires a dedicated sound crew as of now.


I think that will happen. The next SuperCMIT (or the one after that) will have AI built in to ignore everything but dialog. Maybe it’ll have a choice of setting like „Dialog“, „Music“, „Ambiance“ or whatever. With several cameras, there will be one mic on each camera and the auto-mixer in the ifb receiver will create a nice mix from that. But, as sound crews are cheap compared to everything else, I think there will be a stronger drive to get rid of the set altogether. Just let AI create the entire movie. It’ll start with movies that are mostly created by computers already, like large chunks of Marvel or animated movies. 
It‘s possible that audiences will be bored by this fairly soon and we’ll return to more regular film-making and some directors will insist on realism, but the temptation will be huge. 

Link to comment
Share on other sites

Right now, execs are trying to convince post-production to replace striking actors' voices and ADR-time with easily sourced (they have all the files) AI-created audio. I recently found an AI-powered noise reduction Pro Tools plugin that does near everything 833 NoiseAssist (or some time with RX) can do, in the turn of a digital dial...

 

But the neural computing isn't yet at a point where it knows how to mic an actor on an ever-changing film set, nor does it know how to hold a boom, nor does it know how to have an affable conversation with a Director (...or does it? ha)!

 

I don't know if Production Sound has to worry about AI just yet — until Moore's Law brings the world invisible and idiot-proof lavs with mics that can be placed by a chimpanzee. But producers are always going to try to cut corners (after all, it's their job)!

 

[edited for clarity]

Link to comment
Share on other sites

I think there will be a change in what "acceptable" production sound is to allow this technology (and the attendant savings) to happen.  Many older sound people mourn the advent of the "all -lavs-all-the-time" approach to production sound, vs. what could be done with skilled boom ops, location treatment, set discipline and rehearsals: the "all-lav" sound is now accepted and probably preferred by many.  Whatever "AI production sound" ends up sounding like will probably be another step away from naturalness and real-location audio perspective, many of us won't like it, but it will gradually become the accepted norm just as "all-lav" did.  Sorry.

 

Link to comment
Share on other sites

I'm on a documentary right now and I have been having issues with doing interviews in loud environments (I'm on a boat) I like ambience, but sometimes it's been overwhelmingly loud, depending where the director/editor wants to shoot. He says not to worry, he uses noise reduction software that makes everything sound like a podcast (SM7b) and I died inside.

Link to comment
Share on other sites

Philip, 

Yes, too many "lav everyone" jobs in resent years. I get that mindset for reality shows but not narratives. I have seen a couple dramatic TV shows in the past year in which the actor's exterior location performance sound (whispering) is so pristine and doesn't vary with camera distance that it is totally distracting. Completely unnatural. I dont think it was ADR'ed. I am sure the post sound guy/gal was very pleased with his/her accomplishment but I sure think it was 100% captured with lavs and very sterile sounding.

 

A good boom person and proper mix, IMO,  makes for a far more pleasing and natural dialog track.

 

I say this as the UPM says, "Hey, I can save $600/day if I cut the boom op."

Link to comment
Share on other sites

This is why *we* need to step up with our unions and demand a three person team beyond a Tier 1!

 

The area standards contract is about to be up for negotiation, and that's the one that we get the most of out here. It's also the one that most of you all use outside of Los Angeles, so you guys could help with this! I have joined my locals negotiation team and I gotta say it's very disheartening. The producers could just decide to say "we won't make a decision on that" and bang! the issue is gone. There's only a few areas that they must respond to, so they usually throw everything else away because they can. The three person sound team is one of those issues. I'm going to propose it anyhow, but I fully expect it to be shot down.

 

My recommendation is to have as many locals as possible put this in their negotiations, that might make them answer.  So contact your negotiations team and request a proposal for a three person team for anything over Tier 1!  
 

This should actually be its own thread, because I would love to bring other proposals that the national sound community wants in our contracts.

 

Dan Izen

Link to comment
Share on other sites

13 minutes ago, PMC said:

I say this as the UPM says, "Hey, I can save $600/day if I cut the boom op."

Studios would love to just have bag jockies with 16 channels of wireless while paying for labor + kit at less than it would cost to rent the kit alone at a rental house.  That's their end goal.

Link to comment
Share on other sites

4 hours ago, codyman said:

Studios would love to just have bag jockies with 16 channels of wireless while paying for labor + kit at less than it would cost to rent the kit alone at a rental house.  That's their end goal.


It might be their end goal, but they will be disappointed if they think they will realize any labour savings from it.  Managing 16 channels efficiently still requires multiple people to do well (my rule of thumb is 4 tx per person, can stretch to 6, but not a good idea to attempt that long-term).

Aside from that, someone from sound still needs to run the set ... this is an important part of the boom op's role that PMs will miss if they try and shrink the sound dept any further (which isn't to say they appreciate how important that is right now).  They will lose far more money in botched takes and "waiting on sound" than than they will gain in saving a person on set.

From a mixer's / Head of Dept. point of view, it is our job to make clear to the producers that a boom op does much more on set than just swing the boom.  They will learn that the hard way if we don't help teach them.

Link to comment
Share on other sites

  • 2 months later...
how will AI affect the capture of dialogue for films and television?

AI is already having an impact on the film and television industry, including the capture of dialogue. Here are several ways in which AI is influencing this aspect of content creation:

  1. Automated Transcription: AI-powered transcription services can quickly and accurately transcribe spoken words in audio or video files. This technology significantly speeds up the process of turning spoken dialogue into written text, making it easier for filmmakers and editors to work with the script.

  2. Voice Recognition for Subtitling: AI-driven voice recognition systems are improving the accuracy of subtitling. This is particularly useful for translating dialogue into different languages or making content more accessible to individuals with hearing impairments.

  3. Scriptwriting Assistance: AI tools can analyze successful scripts and provide insights into writing styles, dialogue structures, and genre-specific patterns. This assistance can be valuable for writers and filmmakers looking to enhance the quality and authenticity of their dialogue.

  4. Voice Synthesis: Text-to-speech (TTS) technology, powered by AI, is becoming more advanced. This can be useful for creating placeholder voiceovers or even generating entirely synthetic voices for characters in situations where a human voice actor may not be available or feasible.

  5. Dialogue Enhancement: AI tools can be used to analyze and enhance dialogue in post-production. This includes adjusting audio levels, removing background noise, and even modifying the tone or pitch of voices to achieve a desired effect.

  6. Character Development: AI can analyze patterns in successful films and TV shows to provide insights into character development and dialogue that resonates with audiences. This information can be used by writers to create more engaging and relatable characters.

  7. Real-time Translation: AI-powered translation services are becoming more advanced, allowing for real-time translation of dialogue during filming. This can be particularly useful for international productions or when working with a diverse cast and crew.

  8. Emotion Analysis: AI can be used to analyze the emotional content of dialogue, helping filmmakers understand how the audience might respond to specific scenes. This information can be valuable in shaping the overall emotional tone of a film or TV show.

While AI brings many benefits to the industry, it's important to note that human creativity, intuition, and emotional understanding remain crucial in crafting compelling dialogue that resonates with audiences. AI tools are most effective when used as aids to human creativity rather than replacements for it.

-chatgpt

Link to comment
Share on other sites

All varying degrees of useful IMO, and nicely lacking in the FUD that it's all going to replace us.

And the paranoid voice in me says "of course chatgpt would say that".

I have to wonder who is using AI for "emotion analysis".  Is that for directors who don't understand the emotional content of what they are creating?  Why does anyone need a tool for this?

Link to comment
Share on other sites

I am trying to remain positive, but pretty sure some things will be change - as they changed over the years since of the first take with audible spoken dialogue.

 

My main concern is not only about production sound, but overall as an art in cinema and television.

 

From the other hand, my main positive view coming from the point of: Human has been created Art, in any form (eg. cinema, music, dance etc) to cure the knowledge of death and have festive. It has been created by human and only human can understand and feel any form of Art; and somehow we will protect this by no matter what economic model now (eg. capitalism) or in the future. 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...