Jump to content

32-bit for Dialogue: Yay or Nay?


Recommended Posts

1 hour ago, Jim Feeley said:

I think it's reasonably helpful to make clear(ish) that at 0, 32-bit files match with 24-bit and 16-bit files.

Is this actually defined in a spec somewhere?  I'm inclined to think that one possible reason why the original poster needed 30dB of gain is because there *isn't* a defined way to align levels between 32-bit and 24-bit.  My understanding is that every DAW has adopted a different convention for this conversion, and that this is part of the reason why it's such a headache to export levels "correctly" when dealing with 32-bit in post.

I would love a better understanding of what's happening here.  I believe using 32-bit float to represent PCM audio has been done since the '90s, (at least, I remember being able to save it in that format with the audio recorder in Win95), but I'm not sure if there's any official spec for how the format (or any digital format, 24-bit included) is supposed to translate into other formats.  Don't we typically calibrate our meters so that +4dbV (aka line level or "analogue 0") = -20dbFS precisely *because* there isn't a spec to do this?  (i.e. we've adopted that equivalence by industry convention precisely because there isn't actually any "official" spec that tells us how loud it's supposed to be?

Link to comment
Share on other sites

3 hours ago, The Documentary Sound Guy said:

No, you're not.  I find it incredibly misleading as well, as it replaces a coherent technical scale with, essentially, a conventional scale.  An unfortunate piece of marketing genius to convince people that it is superior.

In reality, the biggest difference actually has to do with the fact that it's floating point rather than integer, and that has ramifications for where in the scale it is best to record.  With floating point, precision (and therefore audio resolution) is highest in the *middle* of the scale, whereas integer formats are equally precise throughout the scale, but there is a quantization noise floor at the bottom of the scale, which means the "best" audio is at the top end of the scale (i.e. "full scale").

It's been a little while, but if I recall correctly, the difference between the maximum value that can be represented in 32-bit float and the second-highest value is something like 8dB, which would produce very, very bad results if we tried to record at the top of the scale the way we do with integer formats.

In reality the precision of 32-bit float is identical to 24-bit float.  32-bit float uses 23 bits for the base, plus 1 sign bit = 24 significant digits.  The remaining 8-bits are an exponent (mantissa), which is what creates the massive 770dB increase in dynamic range ... but at the cost of a loss of precision at the outer bounds of the scale.

I think it would be more accurate simply to stop using dBFS as a scale for floating point and recognize that 32-bit float requires using a different reference point for best results.  We need something like dBFP (dB floating point), where convention dictates that 0dBFP is the middle of the floating point scale (i.e. 0.0), and 72dBFP = 0dBFS.  If I'm doing my math correctly, this convention would properly match 32-bit float and 24-bit int such that both formats would be used to their highest potential at similar recording levels.

Presumably, a similar convention is already being used under the hood by most DAWs, and the claim of "exceeding" dbFS 0 simply indicates when signal levels exceed what a 24-bit int format would be capable of representing (i.e. the point at which precision starts to degrade in floating point).

A major disadvantage of switching scales is we'd all have to adapt to new standards for "correct" recording levels.  Presumably, we'd all have to target somewhere around 52dBFP for "normal" dialogue levels, and I can only imagine the amount confusion this is likely to cause between set and post...


I prefer 56dBFP for dialog personally, but to each their own :)

 

Nice write up about some of the technical side.  Funny side note, 24 bit WAV files are Signed INTs so they also center around zero and have a range of +/- 8,388,607.  Though as you point out their accuracy is linear throughout that scale.

 

With that I always assumed, without ever looking into it, was since both values are 0 centered, then translating between 24 bit Integer and 32 bit float the 0 point would be the point of reference.  The SD article claims that 0dBFS for 24 bit and 0dBFS for 16 bit are the 0 point of a 32 bit file, which they also identify as 0dBFS.  I found this a very confusing statement:
 

Quote


There is one other aspect of 32-bit float files which is not immediately obvious. Files recorded with 32-bit float record sound where 0 dBFS of the 32-bit file lines up with 0 dBFS of the 24- or 16-bit file. Keep in mind that unlike the 24- or 16-bit files, the 32-bit file goes up to +770 dBFS. So compared to a 24-bit WAV file, the 32-bit float WAV file has 770 dB more headroom.


dynamic_range_chart.png

 

In that picture they show 0dBFS as the (almost) center of the dynamic range of a 32 bit float file, but that's not 0dBFS of the 32 bit file, it's 0dBFS of the 24 and 16 bit files.  Full Scale of the 32 bit file is still the largest amplitude that it can represent.  It's confusing to me.

That same chart actually highlights the benefits of 32 bit really well if they just dropped the FS so it represented the available dynamic ranges instead of trying to tie everything to a somewhat undefined full scale reference. 


That fact that you could potentially preserve the entire existing dynamic range and recover accidental clipping without any boost in system noise is great in theory.  However, the last time I recorded in a location where signal noise became a noticeable contribution to my track I was in an anechoic chamber.  This is why despite the enjoyable debate around the theory behind 32 audio, I have not been able to come up with a scenario where recording them on location is worth it.

 

This brings up another question I have about how some of the 32 bit recorders work.  The marketing materials seem to claim that you don't need to set gain cause you record everything without noise and can set gain in post.  However, unless there is a new class of A2D's I'm unfamiliar with, you have to amplify the tiny voltage from a mic to get into the correct voltage range for the AD.  So do they all just provide a fixed amount of gain and figure you'll sort it out when you convert to a 24 bit workflow?   If so, that could possibly explain the need for 30dB of gain in post?

 

 

Link to comment
Share on other sites

I have to wonder, are there 32-bit DACs out there?  How do they handle outputting 1528dB of dynamic range?  Presumably they must clip *far* below "full scale" (and have a noise floor well above "minimum scale"), since that would involve voltages on the order of 1038 volts (assuming a similar scale conversion to what we've been discussing).  Can you imagine trying to reproduce a 770dB signal through a loudspeaker?  It's been a while since I read the physics, but I seem to recall thermonuclear-scale effects that start around 200dB...

 

Given that *lightning* (a mere 109 volts) penetrates just about any insulation, I think it's safe to say building a 32-bit DAC with remotely linear performance is physically impossible.  It seems like a safe bet that 32-bit DACs would need to be *very* non-linear outside of certain operating parameters (presumably, roughly the middle 140dB of the range based on current state-of-the art signal-to-noise designs for other types of components).  And, if that is true, one wonders what the *actual* intended range of use is.  What would happen if you tried to output a 32-bit file with, say, 200dB of digital gain above "normal" levels?  You'd be able to see the gain in software, but trying to reproduce it physically would result in, shall we say, an undefined result.

Link to comment
Share on other sites

Decibel is a relative term; dBFS, dBV, dBU... in the physical world, dB SPL (sound pressure level) is different.


"At 194 dB, the energy in the sound waves starts distorting and they create a complete vacuum between themselves. The sound is no longer moving through the air, but is in fact pushing the air along with it, forming a pressurized wall of moving air. This is called a shock wave, and it is at this point that a “sound” becomes a physically perceptible and possibly dangerous force."

 

I think the 770 dB etc. are more like theoretical and exists only in the digital "world".

Link to comment
Share on other sites

2 hours ago, Johnny Karlsson said:

I think the 770 dB etc. are more like theoretical and exists only in the digital "world".

Yup.  Hence the (rhetorical) question:  What would happen if we tried to reproduce a 770dB *signal* through a loudspeaker?  What happens if you try and turn a theoretical number into reality?  Most likely, things blow up as they are pushed past their physical limits (both electric and sonic).

Also, how honest is it to be selling a product with 770dB of "headroom" that can only exist in theory?  What are the actual *physical* specs of a 32-bit DAC, and what is the point of a device that allows you to try to reproduce a signal that can only exist in theory?

Link to comment
Share on other sites

8 hours ago, The Documentary Sound Guy said:

What would happen if we tried to reproduce a 770dB *signal* through a loudspeaker?

I'm afraid there is no such thing.

 

"... a thermonuclear bomb can produce an overpressure corresponding to 278 decibels. This is QUITE lethal. 200 decibels is enough to cause instant death, and 278 decibels is about 10 million times greater sound intensity."

 

Last time I checked, the best converters available would achieve around 20 to 21 bits. Analog circuitry is part of all converters.

 

- I am guessing here -

 I think this is where the 32-bit converters are "different" in a way that the signal coming in is somehow turned way down before it hits the converter, then once in the digital domain, it's brought back up, with the safety of basically not ever hitting dBFS (clipping) at the converter stage. Since the 32-bit digital noise floor is basically non existent it doesn't add any to the signal when turned up.

But the real world dynamic range (difference between the softest and loudest sound) basically remains the same as in 24-bit - ie the noise floor of the physical world (hvac, transmitter self noise, mic self noise etc.) and the loudest scream is no different. The good news is that  you can be sure that the unexpected/improvised scream doesn't clip when you fall asleep with the trims and faders all the way up.

Link to comment
Share on other sites

17 hours ago, The Documentary Sound Guy said:

Is this actually defined in a spec somewhere?  I'm inclined to think that one possible reason why the original poster needed 30dB of gain is because there *isn't* a defined way to align levels between 32-bit and 24-bit.  My understanding is that every DAW has adopted a different convention for this conversion, and that this is part of the reason why it's such a headache to export levels "correctly" when dealing with 32-bit in post.

 

I haven't gone looking for an AES spec or rec or anything. I'm mainly rolling with this bit from the Sound Devices 32-float explainer from a year or two ago (and that was linked to earlier in this thread... and that I'll link to here) . 

 

"There is one other aspect of 32-bit float files which is not immediately obvious. Files recorded with 32-bit float record sound where 0 dBFS of the 32-bit file lines up with 0 dBFS of the 24- or 16-bit file. Keep in mind that unlike the 24- or 16-bit files, the 32-bit file goes up to +770 dBFS. So compared to a 24-bit WAV file, the 32-bit float WAV file has 770 dB more headroom."

 

And for yucks I've used the Track E I have as a (mono) backup recorder. Mainly to get a bit familiar with it and its smartphone app. Anyway, I rolled 32-bit once and 24-bit a few times. Levels seem to line up. I set tone to -20dBFS, but I didn't do any carefully calibrated steps up to 0dBFS, so maybe there's some variance. But at least for the Track E that I have, it's close. And I'll assume SD does what they say. Zoom probably, too. But I don't know if that's down to some AES thing, chip manufacturer design choices, or just common best practices. As you say, DAWs and NLEs might wander away from good practice.

 

I'm still leaning towards user error, though. The inconvenience for post and the enabling of poor practices amongst not-really experienced users of 32-bit kinda keeps me away from it for now. So I'm still in the 24/48 & nice limiter world. I'm watching but don't have tons of experience with 32-bit. 

Link to comment
Share on other sites

In terms of a DAW; the environment it was designed for is more of a controlled situation - a studio with sound proofing and isolation from the outside world. You can spend a day, or a week tweaking a snare sound. Everything is focused on sound only, so your mic placement, gain staging and levels could (should?) be optimal at the point of recording. Most of the time when mixing (let's say) 30-100 tracks you end up turning things down, rather than adding gain. And where you need more gain, it's easy to add with a plug-in.... the fader is just one of many things that can add or subtract gain.

 

Oh, and to answer Marc's original question; if you need to add that much gain, just use "Normalize". With that you can set the top peak to anything you like, all the way up to 0 dBFS, or -3dB just for safety, and/or use a compressor.

Link to comment
Share on other sites

I had another thought (<-- warning). Maybe Marc's correspondent was using a mic with very low sensitivity (e.g., a SM7B which is really popular with podcasters and YouTubers) and running it into a 32-bit recorder with a preamp that can't add enough gain to bring that mic's signal up to a reasonable or standard level (or that just had a default gain setting too low for a mic like a SM7B). That on its own, and perhaps combined with a mic too far from the source, could create recordings where someone would want to add over 30dB in post. 

 

As for what to do, I recorded a clip on my Track E (the only 32-bit recorder I have) at 32-bit/48kHz and at a really low level (peaks around -45dBFS) and tried a couple simple things that anyone could do. Mainly, I wanted to confirm that the obvious worked on 32-bit files. And I'm still sick at home so what the heck...

 

Adobe's currently free and kinda neat online Enhance Speech tool wasn't great for my track. It processed the 32-bit file and did increase gain and also removed the mic's/recorder's self noise. But it sounds like it gated my speech so heavily that some words were lost. So that's no good. https://podcast.adobe.com 

 

Manually adding 40dB gain pretty much worked, of course. But so did just normalizing. Of course that raised self noise, but a little simple noise reduction got rid of that without hammering the track. 

 

Again, all obvious stuff but perhaps helpful steps for Marc's correspondent.

 

 

Link to comment
Share on other sites

On 3/16/2023 at 9:21 PM, The Documentary Sound Guy said:

I have to wonder, are there 32-bit DACs out there?  How do they handle outputting 1528dB of dynamic range?  Presumably they must clip *far* below "full scale" (and have a noise floor well above "minimum scale"), since that would involve voltages on the order of 1038 volts (assuming a similar scale conversion to what we've been discussing).  Can you imagine trying to reproduce a 770dB signal through a loudspeaker?  It's been a while since I read the physics, but I seem to recall thermonuclear-scale effects that start around 200dB...

 

Given that *lightning* (a mere 109 volts) penetrates just about any insulation, I think it's safe to say building a 32-bit DAC with remotely linear performance is physically impossible.  It seems like a safe bet that 32-bit DACs would need to be *very* non-linear outside of certain operating parameters (presumably, roughly the middle 140dB of the range based on current state-of-the art signal-to-noise designs for other types of components).  And, if that is true, one wonders what the *actual* intended range of use is.  What would happen if you tried to output a 32-bit file with, say, 200dB of digital gain above "normal" levels?  You'd be able to see the gain in software, but trying to reproduce it physically would result in, shall we say, an undefined result.

 

AFAIK there aren't any "true" 32 bit ADCs, either. Aren't they using stacked converters, or something similar?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...