Jump to content

32-bit for Dialogue: Yay or Nay?


Marc Wielage

Recommended Posts

1 hour ago, Jim Feeley said:

I think it's reasonably helpful to make clear(ish) that at 0, 32-bit files match with 24-bit and 16-bit files.

Is this actually defined in a spec somewhere?  I'm inclined to think that one possible reason why the original poster needed 30dB of gain is because there *isn't* a defined way to align levels between 32-bit and 24-bit.  My understanding is that every DAW has adopted a different convention for this conversion, and that this is part of the reason why it's such a headache to export levels "correctly" when dealing with 32-bit in post.

I would love a better understanding of what's happening here.  I believe using 32-bit float to represent PCM audio has been done since the '90s, (at least, I remember being able to save it in that format with the audio recorder in Win95), but I'm not sure if there's any official spec for how the format (or any digital format, 24-bit included) is supposed to translate into other formats.  Don't we typically calibrate our meters so that +4dbV (aka line level or "analogue 0") = -20dbFS precisely *because* there isn't a spec to do this?  (i.e. we've adopted that equivalence by industry convention precisely because there isn't actually any "official" spec that tells us how loud it's supposed to be?

Link to comment
Share on other sites

3 hours ago, The Documentary Sound Guy said:

No, you're not.  I find it incredibly misleading as well, as it replaces a coherent technical scale with, essentially, a conventional scale.  An unfortunate piece of marketing genius to convince people that it is superior.

In reality, the biggest difference actually has to do with the fact that it's floating point rather than integer, and that has ramifications for where in the scale it is best to record.  With floating point, precision (and therefore audio resolution) is highest in the *middle* of the scale, whereas integer formats are equally precise throughout the scale, but there is a quantization noise floor at the bottom of the scale, which means the "best" audio is at the top end of the scale (i.e. "full scale").

It's been a little while, but if I recall correctly, the difference between the maximum value that can be represented in 32-bit float and the second-highest value is something like 8dB, which would produce very, very bad results if we tried to record at the top of the scale the way we do with integer formats.

In reality the precision of 32-bit float is identical to 24-bit float.  32-bit float uses 23 bits for the base, plus 1 sign bit = 24 significant digits.  The remaining 8-bits are an exponent (mantissa), which is what creates the massive 770dB increase in dynamic range ... but at the cost of a loss of precision at the outer bounds of the scale.

I think it would be more accurate simply to stop using dBFS as a scale for floating point and recognize that 32-bit float requires using a different reference point for best results.  We need something like dBFP (dB floating point), where convention dictates that 0dBFP is the middle of the floating point scale (i.e. 0.0), and 72dBFP = 0dBFS.  If I'm doing my math correctly, this convention would properly match 32-bit float and 24-bit int such that both formats would be used to their highest potential at similar recording levels.

Presumably, a similar convention is already being used under the hood by most DAWs, and the claim of "exceeding" dbFS 0 simply indicates when signal levels exceed what a 24-bit int format would be capable of representing (i.e. the point at which precision starts to degrade in floating point).

A major disadvantage of switching scales is we'd all have to adapt to new standards for "correct" recording levels.  Presumably, we'd all have to target somewhere around 52dBFP for "normal" dialogue levels, and I can only imagine the amount confusion this is likely to cause between set and post...


I prefer 56dBFP for dialog personally, but to each their own :)

 

Nice write up about some of the technical side.  Funny side note, 24 bit WAV files are Signed INTs so they also center around zero and have a range of +/- 8,388,607.  Though as you point out their accuracy is linear throughout that scale.

 

With that I always assumed, without ever looking into it, was since both values are 0 centered, then translating between 24 bit Integer and 32 bit float the 0 point would be the point of reference.  The SD article claims that 0dBFS for 24 bit and 0dBFS for 16 bit are the 0 point of a 32 bit file, which they also identify as 0dBFS.  I found this a very confusing statement:
 

Quote


There is one other aspect of 32-bit float files which is not immediately obvious. Files recorded with 32-bit float record sound where 0 dBFS of the 32-bit file lines up with 0 dBFS of the 24- or 16-bit file. Keep in mind that unlike the 24- or 16-bit files, the 32-bit file goes up to +770 dBFS. So compared to a 24-bit WAV file, the 32-bit float WAV file has 770 dB more headroom.


dynamic_range_chart.png

 

In that picture they show 0dBFS as the (almost) center of the dynamic range of a 32 bit float file, but that's not 0dBFS of the 32 bit file, it's 0dBFS of the 24 and 16 bit files.  Full Scale of the 32 bit file is still the largest amplitude that it can represent.  It's confusing to me.

That same chart actually highlights the benefits of 32 bit really well if they just dropped the FS so it represented the available dynamic ranges instead of trying to tie everything to a somewhat undefined full scale reference. 


That fact that you could potentially preserve the entire existing dynamic range and recover accidental clipping without any boost in system noise is great in theory.  However, the last time I recorded in a location where signal noise became a noticeable contribution to my track I was in an anechoic chamber.  This is why despite the enjoyable debate around the theory behind 32 audio, I have not been able to come up with a scenario where recording them on location is worth it.

 

This brings up another question I have about how some of the 32 bit recorders work.  The marketing materials seem to claim that you don't need to set gain cause you record everything without noise and can set gain in post.  However, unless there is a new class of A2D's I'm unfamiliar with, you have to amplify the tiny voltage from a mic to get into the correct voltage range for the AD.  So do they all just provide a fixed amount of gain and figure you'll sort it out when you convert to a 24 bit workflow?   If so, that could possibly explain the need for 30dB of gain in post?

 

 

Link to comment
Share on other sites

I have to wonder, are there 32-bit DACs out there?  How do they handle outputting 1528dB of dynamic range?  Presumably they must clip *far* below "full scale" (and have a noise floor well above "minimum scale"), since that would involve voltages on the order of 1038 volts (assuming a similar scale conversion to what we've been discussing).  Can you imagine trying to reproduce a 770dB signal through a loudspeaker?  It's been a while since I read the physics, but I seem to recall thermonuclear-scale effects that start around 200dB...

 

Given that *lightning* (a mere 109 volts) penetrates just about any insulation, I think it's safe to say building a 32-bit DAC with remotely linear performance is physically impossible.  It seems like a safe bet that 32-bit DACs would need to be *very* non-linear outside of certain operating parameters (presumably, roughly the middle 140dB of the range based on current state-of-the art signal-to-noise designs for other types of components).  And, if that is true, one wonders what the *actual* intended range of use is.  What would happen if you tried to output a 32-bit file with, say, 200dB of digital gain above "normal" levels?  You'd be able to see the gain in software, but trying to reproduce it physically would result in, shall we say, an undefined result.

Link to comment
Share on other sites

Decibel is a relative term; dBFS, dBV, dBU... in the physical world, dB SPL (sound pressure level) is different.


"At 194 dB, the energy in the sound waves starts distorting and they create a complete vacuum between themselves. The sound is no longer moving through the air, but is in fact pushing the air along with it, forming a pressurized wall of moving air. This is called a shock wave, and it is at this point that a “sound” becomes a physically perceptible and possibly dangerous force."

 

I think the 770 dB etc. are more like theoretical and exists only in the digital "world".

Link to comment
Share on other sites

2 hours ago, Johnny Karlsson said:

I think the 770 dB etc. are more like theoretical and exists only in the digital "world".

Yup.  Hence the (rhetorical) question:  What would happen if we tried to reproduce a 770dB *signal* through a loudspeaker?  What happens if you try and turn a theoretical number into reality?  Most likely, things blow up as they are pushed past their physical limits (both electric and sonic).

Also, how honest is it to be selling a product with 770dB of "headroom" that can only exist in theory?  What are the actual *physical* specs of a 32-bit DAC, and what is the point of a device that allows you to try to reproduce a signal that can only exist in theory?

Link to comment
Share on other sites

8 hours ago, The Documentary Sound Guy said:

What would happen if we tried to reproduce a 770dB *signal* through a loudspeaker?

I'm afraid there is no such thing.

 

"... a thermonuclear bomb can produce an overpressure corresponding to 278 decibels. This is QUITE lethal. 200 decibels is enough to cause instant death, and 278 decibels is about 10 million times greater sound intensity."

 

Last time I checked, the best converters available would achieve around 20 to 21 bits. Analog circuitry is part of all converters.

 

- I am guessing here -

 I think this is where the 32-bit converters are "different" in a way that the signal coming in is somehow turned way down before it hits the converter, then once in the digital domain, it's brought back up, with the safety of basically not ever hitting dBFS (clipping) at the converter stage. Since the 32-bit digital noise floor is basically non existent it doesn't add any to the signal when turned up.

But the real world dynamic range (difference between the softest and loudest sound) basically remains the same as in 24-bit - ie the noise floor of the physical world (hvac, transmitter self noise, mic self noise etc.) and the loudest scream is no different. The good news is that  you can be sure that the unexpected/improvised scream doesn't clip when you fall asleep with the trims and faders all the way up.

Link to comment
Share on other sites

17 hours ago, The Documentary Sound Guy said:

Is this actually defined in a spec somewhere?  I'm inclined to think that one possible reason why the original poster needed 30dB of gain is because there *isn't* a defined way to align levels between 32-bit and 24-bit.  My understanding is that every DAW has adopted a different convention for this conversion, and that this is part of the reason why it's such a headache to export levels "correctly" when dealing with 32-bit in post.

 

I haven't gone looking for an AES spec or rec or anything. I'm mainly rolling with this bit from the Sound Devices 32-float explainer from a year or two ago (and that was linked to earlier in this thread... and that I'll link to here) . 

 

"There is one other aspect of 32-bit float files which is not immediately obvious. Files recorded with 32-bit float record sound where 0 dBFS of the 32-bit file lines up with 0 dBFS of the 24- or 16-bit file. Keep in mind that unlike the 24- or 16-bit files, the 32-bit file goes up to +770 dBFS. So compared to a 24-bit WAV file, the 32-bit float WAV file has 770 dB more headroom."

 

And for yucks I've used the Track E I have as a (mono) backup recorder. Mainly to get a bit familiar with it and its smartphone app. Anyway, I rolled 32-bit once and 24-bit a few times. Levels seem to line up. I set tone to -20dBFS, but I didn't do any carefully calibrated steps up to 0dBFS, so maybe there's some variance. But at least for the Track E that I have, it's close. And I'll assume SD does what they say. Zoom probably, too. But I don't know if that's down to some AES thing, chip manufacturer design choices, or just common best practices. As you say, DAWs and NLEs might wander away from good practice.

 

I'm still leaning towards user error, though. The inconvenience for post and the enabling of poor practices amongst not-really experienced users of 32-bit kinda keeps me away from it for now. So I'm still in the 24/48 & nice limiter world. I'm watching but don't have tons of experience with 32-bit. 

Link to comment
Share on other sites

In terms of a DAW; the environment it was designed for is more of a controlled situation - a studio with sound proofing and isolation from the outside world. You can spend a day, or a week tweaking a snare sound. Everything is focused on sound only, so your mic placement, gain staging and levels could (should?) be optimal at the point of recording. Most of the time when mixing (let's say) 30-100 tracks you end up turning things down, rather than adding gain. And where you need more gain, it's easy to add with a plug-in.... the fader is just one of many things that can add or subtract gain.

 

Oh, and to answer Marc's original question; if you need to add that much gain, just use "Normalize". With that you can set the top peak to anything you like, all the way up to 0 dBFS, or -3dB just for safety, and/or use a compressor.

Link to comment
Share on other sites

I had another thought (<-- warning). Maybe Marc's correspondent was using a mic with very low sensitivity (e.g., a SM7B which is really popular with podcasters and YouTubers) and running it into a 32-bit recorder with a preamp that can't add enough gain to bring that mic's signal up to a reasonable or standard level (or that just had a default gain setting too low for a mic like a SM7B). That on its own, and perhaps combined with a mic too far from the source, could create recordings where someone would want to add over 30dB in post. 

 

As for what to do, I recorded a clip on my Track E (the only 32-bit recorder I have) at 32-bit/48kHz and at a really low level (peaks around -45dBFS) and tried a couple simple things that anyone could do. Mainly, I wanted to confirm that the obvious worked on 32-bit files. And I'm still sick at home so what the heck...

 

Adobe's currently free and kinda neat online Enhance Speech tool wasn't great for my track. It processed the 32-bit file and did increase gain and also removed the mic's/recorder's self noise. But it sounds like it gated my speech so heavily that some words were lost. So that's no good. https://podcast.adobe.com 

 

Manually adding 40dB gain pretty much worked, of course. But so did just normalizing. Of course that raised self noise, but a little simple noise reduction got rid of that without hammering the track. 

 

Again, all obvious stuff but perhaps helpful steps for Marc's correspondent.

 

 

Link to comment
Share on other sites

On 3/16/2023 at 9:21 PM, The Documentary Sound Guy said:

I have to wonder, are there 32-bit DACs out there?  How do they handle outputting 1528dB of dynamic range?  Presumably they must clip *far* below "full scale" (and have a noise floor well above "minimum scale"), since that would involve voltages on the order of 1038 volts (assuming a similar scale conversion to what we've been discussing).  Can you imagine trying to reproduce a 770dB signal through a loudspeaker?  It's been a while since I read the physics, but I seem to recall thermonuclear-scale effects that start around 200dB...

 

Given that *lightning* (a mere 109 volts) penetrates just about any insulation, I think it's safe to say building a 32-bit DAC with remotely linear performance is physically impossible.  It seems like a safe bet that 32-bit DACs would need to be *very* non-linear outside of certain operating parameters (presumably, roughly the middle 140dB of the range based on current state-of-the art signal-to-noise designs for other types of components).  And, if that is true, one wonders what the *actual* intended range of use is.  What would happen if you tried to output a 32-bit file with, say, 200dB of digital gain above "normal" levels?  You'd be able to see the gain in software, but trying to reproduce it physically would result in, shall we say, an undefined result.

 

AFAIK there aren't any "true" 32 bit ADCs, either. Aren't they using stacked converters, or something similar?

Link to comment
Share on other sites

Super interesting conversation. I feel pretty strongly at this point in saying 'nay' to 32 bit float. I think in theory it solves lot of 'problems' (hiring a sound mixer, engineer, etc) for low budget stuff. Because every time I hear this request from clients, it's always the newer generation that are trying to explore this idea. Sure, if the technology gets adopted at all the different production stages, I really don't care, I'll switch to the 'yay' camp, as I'm sure most of us would. I'm sure protocols/workflows would have to exist, as they do now. At the end of the day, I'll be setting proper levels(for whatever specific format), assessing and mitigating sound issues, matching mics to source, and doing all the things I usually do to deliver the best possible tracks I am capable of. At some point, I believe the HD/X systems (systems before too maybe?) were capable of recording  32-bit fixed or float (you post guys probably have the specifics don't quote me on this stuff) but never once did I or we(collaborators) EVER create a session for a paying client in 32-bit. I did record higher sampling rates like 88.2/96/192 when clients requested it, but I knew that for the most part it would be truncated to 16/44.1. Right now, consumers don't even get that. They get some sort of something that is not even PCM. I now see a lot of people listening to content on tablets, or with the white earbuds, maybe some sort of sound bar or something at home. The theoretical stuff is great. I'll hope (archival). But in the end, I just don't see how 32bit float would ever practically make it to our physical world (all of it). This signal chain pretty much spells it all out. You can have 32 bit all you want but if the 1st thing(mic placement) isn't spot on, someone is going to end up polishing a turd. At least with the technology we have now. 

 

"What's being said > mic placement > mic selection > wind/cable/handling/location-noise mitigation > recorder/preamp choice > gain staging > recording format"

 

To illustrate the low budget comment, this is the list of someone in the 'above the line' camp who is demanding a 32bit float capable recorder sent me. If I didn't have 32bit float, I would've lost a job.

 

H4N recorder / AC power adaptor F6 recorder 
Deity Sync 
Rode Boom mic + XLR cable Rode Boom mic + XLR cable Rode Boom mic + XLR cable Sennheiser wireless Lav A 
Sennheiser wireless Lav B (2) Boom Stand – Extended Arm Procell Batteries 
Headphones 

 

I've never had a high end client have those demands. That's the part that concerns me at the moment and to which I say 'nay' to 32bit float. I'm sure when it's ready to be adopted by the masses it'll be fine.

 

As I type this I'm perhaps figuring out that yes, the obvious is to have that safety where capturing a magical moment, is super nice to have(more work would be needed than just 'setting' it to 32 bit float). Maybe we don't have the technology of having transducers either deliver to an A/D or receive from a D/A right in this moment in time, but perhaps we will in the future ? The archival aspect of it is interesting. Maybe something could be re-mixed later on (I'm thinking music here/or sound stage stuff I guess) and nuances can be uncovered therefore making the delivery of the message better? Could be. Maybe it will have more info for AI synthesis. Maybe a badly placed boom/lav mic will have enough information for an AI entity to recreate that moment in the desired way? Maybe it could be used for scientific research? I ramble on. I kinda want to read up on the theory regarding archiving for the future, but I think we have bigger things to worry about at the moment. Cheers everyone, great discussion.

Link to comment
Share on other sites

Saying 32-bit files have a dynamic range north of 1000 dB is technically true, but kinda wrong. That is the mathematical limit of 32-bit float, but in practice, at best you can record the full dynamic range the preamp is capable of.

 

For 24-bit (and 16 for that matter), each sample records the amplitude as an integer value represented in binary. Because you have 24 bits to work with, the highest value is 2^24, or 16,777,216. Another way to look at it with each sample, 0 dBFS has a value of 16,777,216, and -144dBFS has a value of 0, with all the samples we record coming somewhere in between, though a good deal higher than 0.

 

0 dBFS is the upper limit because it can't record a higher value, so all the samples just register a value of 16,777,216, hence the clipping sound in the recording.

 

32-bit float works differently because instead of recording integer values to correspond with the amplitude of each sample, it records that as equations. So on your recording, you set the gain too high and it clips at 0 dBFS. In post, they drop the levels to a more reasonable -10 dBFS, the software just runs the equation again to reconstruct the waveform, which is admittedly pretty cool.

 

As others have pointed out, it's most useful in professional settings for sound effects with a high dynamic range that you can't reasonably adjust for on the fly. And in practice, you're still limited by the preamps and microphone. It doesn't actually make anything sound better and if you know what you're doing as we all do, there is no advantage to recording dialogue at 32-bit. And given all the problems it creates such as added file size (okay, that's negligible these days), lack of compatibility with a lot of common systems, and the fact it's only on prosumer gear and not high end systems, there is no need.

 

I suspect there'll be a push to make it industry standard since non-sound folk just see a higher number and assume it's better quality much to my annoyance.

Link to comment
Share on other sites

Does anybody know why manufacturers started using 32-bit float rather than 32-bit integer?  That would buy an additional 48dB of gain, they could have decided on a similar convention where "normal" levels are much farther below 0dBFS, and we could have all (marginal) benefits of 32-bit with none of the confusion.  Or, at least, less confusion.

As an additional thought, perhaps we as sound engineers need to come up with a more reasonable definition of clipping for 32-bit float.  Clipping in the analogue domain is defined as the point at which audio circuitry behaves non-linearly within a certain tolerance (i.e. the output does not reproduce the expected input within an expected precision).  It's not that it's *impossible* to reproduce audio above the clipping point, it's that the audio that does get reproduced is degraded (a property that, on tape, can be exploited for creative purposes as "saturation").

The same is true in floating point audio.  At some point, well below the theoretical +770dB "maximum", floating point precision is low enough that it will begin to affect audio quality.  I would think that this point is what should really be considered a clipping point for floating point audio.  Like analogue audio, this degradation is gradual, and it isn't a brick wall maximum the way digital clipping is in 24-bit integer audio.  But, arguably, the "extra" dynamic range above that point shouldn't really be considered part of the dynamic range of the format, because it doesn't accurately reproduce audio within an adequate tolerance to maintain transparent audio quality.  The question is, what is a reasonable tolerance?

Theoretically, if we wanted to maintain the degree of precision that we are used to in 24-bit audio, we could define clipping as "less precise than integer audio", in which case the dynamic range of 32-bit float would, in fact, be the same 144dB we are used to for 24-bit audio.  32-bit float is represented as a 23-bit base, a 1-bit sign, and an 8-bit exponent, which means 24 bits (23-bit base + 1 sign) are devoted "significant digits".  The exact range of values that are available in 24-bit integer representation can be represented in 32-bit float using an exponent of 1.  Values *above* this range start to lose precision because they have to be represented using an exponent greater than 1, which means it is no longer possible to represent every integral number above the 24-bit maximum.  (For example, prime integers can't be represented in with an exponent that isn't 1).

This is probably too harsh a tolerance — the advantage of 32-bit float is that the loss of precision "above" 24-bit capability *isnt'* immediately audible.  So, realistically, we probably need to determine the clipping point of 32-bit float experimentally, based on a tolerance that represents when quality starts to degrade *audibly*.  (I bet the necessary research was done in the '70s when LPCM audio was first invented).

The point being, if we had a better definition of clipping for floating point audio, we would actually have a realistic basis for comparing dynamic range across digital formats, and we wouldn't be dealing with this 1500dB of dynamic range technobabble that has no actual meaning.  And would have an actual basis to call Sound Devices out for spreading technobabble nonsense.

Link to comment
Share on other sites

Real world example/warning - I was mixing a classical guitar orchestra piece recently and did the mixes as 32 bit floating point  bounces, no plugins on the mix bus, and I found that +3.5dBFS was still a clipping point.  Latest version of Digital Performer.  I have no idea why.  I did get some 32 bit floating point masters back that were hotter, but did not show clipping.  

Link to comment
Share on other sites

  • 2 months later...

 

On 3/15/2023 at 10:49 PM, Johnny Karlsson said:

Which is great for all those times we have to record an ant fart

Well I'm convinced, sign me up to record ant farts!  
 

I really enjoyed reading this thread, thanks for posting all that great information everyone!

Link to comment
Share on other sites

My main bag mixer is a 633 recording at 24bit and my lightweight run and gun bag uses a MixPre-3 II recording at 32bit float. I also have some Wisycom MTP60’s that record to an internal SD card at 32 bit float. I often send a mix of 24 and 32 bit files to post with no issues.  Most post production software vendors seem to be up to speed with it now so if anyone came back to me regarding my files I’d first suggest they update to the latest version if possible. 


Regarding 32bit float in general, seems to be aimed at lower cost recording devices marketed towards one-man-band camera guys who need to record in a 'set it and forget it' style.  The Scorpio and other 8 series recorders don’t support it (according to the Sound Devices website). So it's probably cheaper for manufacturers to put 32bit float recording capabilities into a sound recorder than it is to install good limiters.  There are some innovative uses for it:
https://www.youtube.com/watch?v=IP0M1kWMPD8&t=819s
 

Link to comment
Share on other sites

  • 3 weeks later...

Marc was being dishonest when he posted this. The person in question was asking for Blackmagic to implement a feature request in Resolve allow him to crank the post sound levels up more than 30dB, 30dB being the current limitation of Fairlight in Resolve because he was receiving 32-bit files from the CLIENT that were too low..

 

Proof: https://forum.blackmagicdesign.com/viewtopic.php?f=33&t=177369

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...