Jump to content

"What? Huh? Why your favorite shows and films sound worse than ever"


Recommended Posts

And subtitles are not the answer.  The whole point of a movie or show is to watch it.  To see the environment, the actors, everything.  If you are busy reading the subtitles, you've just missed everything you were supposed to see.  Especially with the break-neck pace that everything is edited at, today.  As they said in the article, you might as well read the book, if you're going to have to rely on subtitles.

Link to comment
Share on other sites

I extracted the text and removed the advertising. Below is the complete text of the article for those who may have been having some difficulty.

 

Everything sounds bad, and there’s nothing we can do about it

https://www.avclub.com/television-film-sound-audio-quality-subtitles-why-1849664873

 

Television today is better read than watched—and frankly, we don’t have much of a choice in the matter. Over the last decade, the rise in streaming technology has led to a boon in subtitle usage. And before we start blaming aging millennials with wax in their ears, a study conducted earlier this year revealed that 50 percent of TV viewers use subtitles, and 55 percent of those surveyed find dialogue on TV hard to hear. The demographic most likely to use them: Gen Z.

 

Mounting audio issues on Hollywood productions have been exacerbated in the streaming era and made worse by the endless variety of consumer audio products. Huge scores and explosive sound effects overpower dialogue, with mixers having their hands tied by streamer specs and artist demands. There is very little viewers can do to solve the problem except turn on the subtitles. And who can blame them?

 

“It’s awful,” Jackie Jones, Senior Vice President at the Formosa Group, an industry leader in post-production audio. “There’s been so much time and client money spent on making it sound right. It’s not great to hear.”

 

Formosa is one of the many post-production houses struggling to keep dialogue coherent amid constant media fracturing. “Every network has different audio levels and specs,” Jones told _The__A.V. Club_over Zoom. “Whether it’s Hulu or HBO or CBS. You have to hit those certain levels for it to be in spec. But it really is how it airs and how it airs is out of our control.”

 

After it leaves a place like Formosa, the mix might go through an additional mix at the streamer and another mix, so to speak, by the viewer’s device. Of course, this is the last thing they want in the audio industry. “Dialogue is king,” sound editor Anthony Vanchure told us. “I want all the dialogue to be clean as clear as possible, so when you hear that people are struggling to hear that stuff, you’re frustrated.” And yet, we still end up with the subtitles on. If we’re just going to read an adaptation of The Sandman on Netflix, why even bother making it?

 

“Everybody’s very unhappy about it,” said David Bondelevitch, associate professor of Music and Entertainment Studies at the University of Denver. “We work very hard in the industry to make every piece of dialogue intelligible. If the audience doesn’t understand the dialogue, they’re not going to follow anything else.”

 

Streamers and devices make terrible music together

 

With all this technology at our fingertips, dialogue has never been more incoherent, and the proliferation of streaming services has made the landscape impossible to navigate. Aside from the variety of products people watch media on, no two streamers are alike. Each one may have a different set of requirements for the post-production house.

 

As far as streamers go, editors say Netflix is the best for good sound and even published their audio specs publicly, but the service is an outlier. “They have put an awful lot of money into setting up their own standards, whereas some of the other streamers seem to have pulled them out of their asses,” Bondelevich said. “With some of these streamers, editors get like 200 pages of specifications that [they] have to sit there and read to make sure that they’re not violating anything.”

 

All streamers aren’t so forgiving. “I was at lunch with a couple of friends recently off of a mix, and they were at lunch answering emails because they did the mix, completed the mix, and everybody’s happy,” said Vanchure. “And then the director got like a screener or was able to watch it at home, you know, being whatever streaming service he was using. And he was like, ‘Hey, this sounds completely different.’”

 

Today, sound designers typically create two mixes for a film. The first is for theatrical, assuming that the film is getting a theatrical release. The other is called a “near-field mix,” which has less dynamic range (the difference between loud and quiet parts of a mix), making it more suitable for home speakers. But just because the mixes are getting better doesn’t mean we’ll be able to hear them.

 

“‘Near field’ means that you’re close up on the speakers, like you would be in your living room,” said Brian Vessa, the Executive Director of Digital Audio Mastering at Sony Pictures. “It’s just having a speaker near you so that what you’re perceiving is pretty much what comes out of the speakers themselves and not what is being contributed by the room. And you listen at a quieter level than you would listen to in the cinema.”

 

“What the near-field mix is really about is bringing your container in a place where you can comfortably listen in a living room and get all of the information that you’re supposed to get, the stuff that was actually put into the program that that might just kind of disappear otherwise.”

 

Vessa wrote the white paper on near-field mixes, creating the industry standard. He believes a big part of the problem is “psycho-acoustic,” meaning we simply don’t perceive sound the same way at home and at the theater, so if a good near-field mix isn’t the baseline, audiences are left to fend for themselves.

 

Complicating matters, where things end up has never been more fluid. “In TV we anchor the dialogue so it is always even and clear and build everything else around that,” said Andy Hay, who delivered the first Dolby Atmos project to Netflix and helped develop standards for the service. “In features we let the story drive our decisions. A particularly dynamic theatrical mix can be quite a challenge to wrestle into a near-field mix.” With so many productions being dumped on streaming after a movie’s complete, audio engineers might not even know what format they’re mixing for.

 

And there is the home to deal with. Consumer electronics give users a number of proprietary options that “reduce loud sounds” or “boost dialogue.” Sometimes they simply have stupid marketing names like “VRX” or “TruVol,” but they are “motion smoothing” for sound. Those options, which may or may not be on by default from the manufacturer, attempt to respond to noise spikes in real-time, usually trying to grab and “reduce” loud noises, like explosions or a music cue, as they happen. Unfortunately, they’re usually delayed and end up reducing whatever’s following the noise.

 

It’s not just the speakers that are the problem. Rooms, device placement, and white noise created by fans and air conditioners can all make dialogue harder to hear. A near-field mix is supposed to account for that, too. “I listen very intently and very quietly, because that way all of these other factors, the air conditioner, the noise next door, all the other stuff that could be clattering around and stuff starts to matter. And if I lose something, we got to bring that up.”

 

The long road to bad sound

 

The sound issues we’re experiencing today are the result of decades of devaluing the importance of clear audio in productions. Bondelevitch cites the move from shooting on sound stages with theatrical actors as the first nail in the coffin. Sounds stages provide an isolated place to pick up clear dialogue, usually with the standard boom mic “eight feet above the actors.” The popularity of location shooting made this impossible, leading to the standardizing of radio mics in the ’90s and 2000s, which present their own problems. Cloth rustle, for example, is tricky to edit and leads to more ADR, which actors and directors alike hate because it diminishes the performance given on set.

 

In the early days of cinema, when most actors were theatrically trained for the stage, performers would project toward the microphone. Method acting, however, allowed for more whispering and mumbling in the name of realism. This could be managed if more time were put into rehearsal, where actors could practice the volume and clarity of their lines, but very few productions have that luxury.

 

One name that keeps being brought up by sound editors for this shift is Christopher Nolan, who popularized a growly acting style through his Batman movies. The problem remained consistent throughout his Dark Knight trilogy, with Batman’s and Bane’s voices being two consistent complaints even among fans of the movies. When Bane’s voice was totally ADR’d following the film’s disastrous IMAX preview, it overpowered the rest of the movie. “The worst mix was The Dark Knight Rises,” he said. “The studio realized that nobody could understand him so at the last minute they remixed it and they made him literally painfully loud. But the volume wasn’t the problem. [Tom Hardy’s] talking through the mask, and he’s got a British accent. Making it louder didn’t fix anything. It just made the movie less enjoyable to sit through.”

 

Volume is an ongoing war not just among sound editors but inside the government. In 2010, the Federal Communications Commission passed the Commercial Advertisement Loudness Mitigation (CALM) Act to lower the volume of commercials. Instead, networks simply raised the volume of the television shows and compressed the dynamic range, making dialogue harder to hear. “They’re trying to compress things so much that they can keep getting louder,” said Clint Smith, an assistant professor of sound design at the University of North Carolina School of the Arts School of Filmmaking, who previously worked as a sound editor at Skywalker Ranch.

 

Smith has been teaching audio engineering for five years and encourages his students to embrace subtitles and work to work them into the narrative of a film in more creative ways. “What does it look like? Ten years down the road, 20 years down the road where subtitles become more prevalent because I don’t see them as going away,” Clint asked his students. “I was kind of just curious about…how can we actually have the subtitles be part of the filmmaking process. Don’t try to run away from them.”

 

As unintelligible dialogue becomes more common, we’ll have no choice but to embrace the subtitle. But at what point are studios and streamers not even bothering to mix sound properly and assuming viewers will just read the dialogue? With subtitles being an option for every streamer, soon, “we’ll fix it in post” could become “they’ll fix it at home.”

 

Sound you can feel

 

There are some things that we can do. For instance, there’s always buying a nice sound system. Even more important is setting it up properly. Most of the sound mixers interviewed recommended having professional help but also mentioned that many soundbars today come with microphones for home optimization. None sounded too convinced by soundbars, though.

 

“If you’re using a soundbar,” Bondelevich said, “Get the best soundbar you can afford. And if you’re listening on your earbuds or headphones, get good headphones. If it’s a noisy environment, get over-the-ear headphones. They do really isolate sound much better and do not use noise canceling headphones because those really screw up the audio quality.”

 

But more than anything, they emphasized how this is a selling factor for movie theaters. If you want good sound, there’s a place that has “sound you can feel.”

 

“It’s a bummer because you want the theater experience,” said Vanchure. “People aren’t going out to theaters as much nowadays because everything’s just streaming. And that’s how you want people to hear these things. You’re doing this work so you can hear this loud and big.”

 

Link to comment
Share on other sites

Thank you for taking the time to repost!  Such a joy to read, like reading a real essay!  
 

Man this is a lot worse than I thought.  I had assumed it was mumbling actors + over zealous sfx / ambience / music + ears that know the dialog, but all that compression and double mixing is horrifying.  Where is the FCC?!  I personally only use subtitles with foreign stuff (I hate dubbed anything), and for the occasional indecipherable phrase.

 

Something else that may be a factor is that there is widespread hearing loss happening in the world.  Loud machines and music are everywhere.  Machines built with zero consideration for damaging sound levels abound, and it makes me so sad.  Too much noise.

 

A boom eight feet overhead?!!!  It's almost inconceivable except I've seen it.

Link to comment
Share on other sites

I feel left out.
How does the calm act makes mixes worse?
The whole idea was to make the overall loudness equal, so TV commercials are 'as loud' as normal programs.
Thus, the whole compression / harmonic tricks are useless to 'stand out to the rest'.
This gave back the ability for mixers to use high dynamics, overcoming the 'peak limits'.
This is, IMHO, a GOOD thing.
What am I missing?
I do understand that 'smart' devices might interfere. (Like the 'auto scaling' on images, change aspect ratioos with or without letterboxing / scaling, 'smart' scaling, whatever did horrible things to original framing.)
About Netflix:
Netflix is (afaik) the only major player that has loudness requirements with 'outdated' specs. I don't care if the specs are public, the implementation of the specs is not, there is NO open source software available for their specs. That is plain arrogant IMHO.
 

 

Link to comment
Share on other sites

One thing that would make a difference would be test screening with fresh ears. Almost everyone involved in the production has at least read the sides, and therefore knows what the actors are supposed to say before they say it. Being primed like that can cause a listener to overlook poorly delivered lines, either through low projection or poor articulation. Defamiliarization would allow people to catch these kinds of things.

 

As "creative choices": you can de-emphasize dialogue in a scenario that doesn't require any exposition, that's fine. But if it's a situation that many people are unfamiliar with, you need to help the audience along.

 

For example: "Pat" walks into a room, up to a counter. On the other side of the counter stands "Chris". Pat mumbles something unintelligible to Chris. Chris then places a glass full of liquid on the counter. Pat picks up the glass and walks away. We can reasonably assume that Pat walked into a bar, and ordered a drink from Chris. Unless it's relevant to the plot, we don't necessarily need every word to arrive at this conclusion.

 

When I saw Tenet the first time in a theater, I could barely keep up. For me, that was an example of when the audience needed more help. You had new characters in new situations whose voices were muffled by gas masks. My second viewing was at home, with subtitles. This helped, but only barely. 

 

Which brings me to one other issue: narrative clarity.  If you're going to be exploring abstract concepts, you need to do so using simple, direct language, as you will be teaching the audience through exposition. It can be challenging to do this in new and relevant ways, of course, but that's what we sign up for as storytellers.

 

Example A: the "Mr. DNA" sequence from Jurassic Park. The goal is to summarize DNA gene splicing for the general audience, just enough for them to follow the plot. It does this very efficiently, in about 3 minutes and 30 seconds, and in a manner that is fully integrated into the story.

 

Example B: all of The Big Short. Here, they chose different methods, including fourth wall breaks and direct address to camera. The film does a great job in explaining a very complex subject simply, essentially the cinematic expression of the Feynman Technique.

 

And here is where I fully graduate to being a grumpy old man: I grew up on the original Star Wars trilogy. I personally don't remember people not being able to understand the dialogue, or being unable to follow the story enough. Lines were either delivered clearly, or looped for clarity. The scripts, while teeming with invented languages and alien words, still conveyed the essential narrative simply enough that even children could follow them.

 

We have solutions to all the problems listed in the above article, we simply need to implement them.

 

End rant. 

Link to comment
Share on other sites

This is probably the 3rd article I’ve read on this subject in the last 5 years and just about every one has been post centered. Maybe there’s some self bias where if you work in post you’re going to think it’s a post problem. No disrespect to the subjects in the article but I think the majority of the issue starts on set. 
 

The actors whisper and the directors think you’re joking if you suggest they speak up. Debilitating useage of multiple cameras on set. Poorly managed LED lighting setups that make boom coverage almost impossible. Sets that sound like crap. The false confidence of ‘everyone’s wired’. 
 

If you ask me.. the problem gets a big head start in production. 

Link to comment
Share on other sites

9 hours ago, Derek H said:

No disrespect to the subjects in the article but I think the majority of the issue starts on set. 
 

The actors whisper and the directors think you’re joking if you suggest they speak up. Debilitating useage of multiple cameras on set. Poorly managed LED lighting setups that make boom coverage almost impossible. Sets that sound like crap. The false confidence of ‘everyone’s wired’. 
 

If you ask me.. the problem gets a big head start in production. 


I actually felt this article, while true that it was written by someone with more experience in post, was quite thorough. The author spends a good deal of time highlighting the very issues listed above.
He specifically discusses the boom mic’s ability to more faithfully capture frequencies in the range critical to intelligibility, and how it’s use has been compromised by the proliferation of body mics, multi-cam shoots, and lighting choices. 
Also of note, the phenomenon of directors not heeding our advice, borne of their familiarity with the script. This is a point that I’ve personally had to make numerous times on set. 

Link to comment
Share on other sites

I often encounter the following on set: 

 

I can’t understand a line / certain words delivered on set. I‘ll speak to the director about it. Two common reactions:

 

one:

I can understand everything just fine. Don’t interfere with the acting! 
 

two: 

ok - you can talk to the actor. 


If I‘m lucky the actor appreciates my concern and can adapt. More often than not the next take got the lines clean and crisp but the acting is suddenly so dull and poor - it’s not gonna make it through editing. 
 

Reaction again from Director: Happy now?! Clean sound but shitty acting. Great! 
 

Link to comment
Share on other sites

On 11/10/2022 at 7:28 PM, Jeff Wexler said:

Sounds stages provide an isolated place to pick up clear dialogue, usually with the standard boom mic “eight feet above the actors.”


Wyatt, this was the only mention of boom mics I could find in the article. Did I miss something?

Link to comment
Share on other sites

There may be confusion as to which article you’re referring to, there are links to two articles above where one was inserted into Jeff’s post. 
 

Intelligibility on set is kind of a trap, since everyone reads the script, everyone knows what the actors are saying and therefore hear everything. That’s a given to us, of course, no need to point it out in a group of pros, but it’s easy to forget that not everyone listens the way we do and we are always outnumbered and outgunned, unless the director is a person who trust us. And that boils down more to luck than anything else in my experience

Link to comment
Share on other sites

Olle, you’re correct I see what Wyatt was talking about in the 2nd article from the protools expert site. My comment was in reference to just the original article being discussed here which glosses over production a bit if you ask me.

 

 Anyway, I suspect that if you were to compare two scenes one that had proper boom coverage with decent performance levels and one that had compromised Boom coverage and other issues there would be zero intelligibility issues with the first scene regardless of what unfortunate processing was done at the hands of the streaming service and despite the viewers crappy sound system. The second scene with the issues on set would be the problematic one. 

Link to comment
Share on other sites

As Olle has said, working with others who trust us is key  --  one of the skills that we learn over the years is to be able to hear the spoken dialog as if we are hearing it for the first time. This gives us a perspective over how most everyone else hears the same dialog. The Director has probably spent hours listening to the dialog in pre-production read throughs, rehearsals, consultation with the writer, etc. Of course they can understand every word. The audience, however, has only one shot at it and often they really miss a lot. Additionally, regarding the article, it is true that there was not much attention paid to production techniques that have changed over the years, and it is my firm belief that the majority of problems begin on the set, on the day, and then are exacerbated later during the mix and all the processing that goes on for the distribution and delivery of content.

Link to comment
Share on other sites

12 hours ago, Derek H said:

Olle, you’re correct I see what Wyatt was talking about in the 2nd article from the protools expert site. My comment was in reference to just the original article being discussed here which glosses over production a bit if you ask me.

 

Sorry for the confusion, Derek. I was referring to the second posted article  (the one from protoolsexpert). I feel the author did a really good job pointing to all the contributing facets of the problem. Maybe the strongest (or at least most through) I've seen on the topic.

 

 

An small excerpt, acknowledging our side:

Pre-existing Knowledge - Those Involved In The Production All Know What Is Being Said

Another big issue at play as to whether a particular line is intelligible or not, is that everyone involved in the production knows what is being said, they have lived with it through pre-production, script editing, shooting, and post-production. This means they probably know the script as well as the actors, if not better!

What this familiarity with the script means is that they can hear the words even when they are not clearly intelligible. For example, this can happen when the drama is being shot, the director knows what is being said, and even if the sound team asks for a retake it is likely to be received with a hard stare and "I can hear it what's your problem"! When we get to the dub when the director comes to sign off on a scene, again they know what is being said and so may well be asking for the FXs and/or music to be lifted to increase the sense of drama in the scene to a much higher level than they would if they were new to the production and hearing it for the first time.

Changes In Production Techniques - More Multi-camera, Less Use Of Boom Mics

Shooting a scene using more than one camera means that your use of a boom mic is compromised at best, as at least one of the cameras tends to be a wild-shot, meaning the boom mic cannot get in close enough to pick up a clean sound. Consequently location sound teams end up relying on the use of personal radio mics. As we learned in our article Speech Intelligibility - The Facts That Affect How We Hear Dialog the spectrum of speech recorded on the chest of a person normally lacks frequencies in the important range of 2-4 kHz, where the constants are, which results in reduced speech intelligibility.

In fact in this article, we also learnt that just over the head, where the boom mic would normally be, is a great position for getting the best speech intelligibility. All of this means that the growth of multi-camera shoots results in a double-whammy, we lose the use of a boom mic and replace it with personal radio mics often in the chest area, which don’t pick up the consonants as well as the boom mic and as we learnt, speech intelligibility is all about the constants.

 

Link to comment
Share on other sites

I, too, believe that intelligibility issues begin on set and carry over all the way to the end. 
 

I changed my career to radio a few years back, but I linger here because  occasionally relapse into film shoots, and the obvious difference is we don’t have any cameras and mostly treated rooms. The problems we tackle are more on subtle language and expressions that have to do with editing. And that IS a post issue, even in movies, where wall to wall dialog is just so tightly edited that you don’t hear any breathing, so unconsciously you’re not hearing the words cus you are gasping for breath, sort of. And that’s an intelligibility issue too. 

And then again it just comes down to understanding what sound is and what information we get out of it. And some directors and producers don’t think of it that way, simply put. Again I’m preaching to the choir. 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...