Jump to content

Why no 32bit float on our pro recorders yet?


Recommended Posts

I have just read this very clear paper on SD website about the 32 bit float resolution, and... I am blown away. 1528 dB of dynamic range, a headroom of 770 dBFS ABOVE our standard 0dBFS !! I understand now why the MixPre series is so popular amongst sound designers.

https://www.sounddevices.com/32-bit-float-files-explained/

 

But now comes to mind an obsessive question: if gear designers like SD and more are capable of creating machines able to record files in 32bit float resolution, why have they not implemented it in their pro range recorders (like the 8 series for SD) ???!?

Anyone capable to clear the mystery for me?

Thanks

Fred

PS: I understand that the actual dynamic range limit of the recording is that of the preamp and AD converters, yet it seems like a new world to me (remember I started on Nagra 4.2 😉 

Link to comment
Share on other sites

i predict it will come.  though not confident enough to predict a timeline.

but.. as SD have one 32bit transmitter available, and i would guess an inevitable 32bit version of the A10 will come at some point, and i believe that SD have said that the 8 series hardware is capable of 32bit. as well as Dante being able to support 32bit, i am hoping there will be a full 32bit system from them in the not too distant future.


though a follow up question would be, how would a 32bit post production workflow work?

are any of the edit systems capable of dealing with 32bit files?

and as you wouldnt need to adjust the gain of the ISOs, how will post easily get enough level for the edit? if there is to be a convert to 24bit step (like i have the option to do with my A20 tx recordings) how sure can we ever be that someone wont screw it up down the line and blame us?

or will we have to spend an hour or so at the end of the day to convert the 32bit files to 24bit. though an post record auto convert / proxy file kind of thing would be pretty cool.

or even being able to record 32bit files to the main hard drive, and 24bit to the SD cards (thinking of SD products here - zaxcom with its primary / secondary disk setup seems like it could easily be adapted for that)

 

 i think there is a large element of build it and they will come. but given how wary post can be of new edit system software updates, it may take a bit of time before we are doing everything 32bit throughout.

Link to comment
Share on other sites

I agree that it will come eventually. But realistically speaking the main purchasers of these professional recorders haven't really been asking for it.

 

At the minute most professional sets are entirely wireless and thus unless you're using Audio Limited it's useful to record 32-bit. I'm sure if more companies had 32-bit transmitters then it would make more sense. With those prosumer recorders, the expectation is that people are using booms or podcasts mics via XLR into the recorder so the full range can be captured un-obtrussively. It's also the obvious element that professionals know how to properly set gain - even if you're going from someone whispering to yelling. The amount of times in my career that I've actually clipped a track is so few that I wouldn't even bother worrying about it.

 

It's also important to note that a couple of recorders have some dynamic range extenders - zaxcom for example has neverclip that adds an additional 20 or 24db to your tracks which I guess would be close to a 32-bit file.

 

The other thing to note is the processing power required not just to record these tracks but to have elements like noise-reduction, automixing, infinite routing and such available is properly quite a bit. With 8-series maybe they can do 32-bit recording, but probably not with dugan and cedar engaged at the same time.

Link to comment
Share on other sites

From what I understand it's the same old story of post having to catch up first. Or at least that's part of it anyways. I feel like for the time being it has more of an immediate use case in sound design where you're more likely to encounter unexpected level changes, but in the long run 32 bit seems like a more elegant solution than Neverclip.

Link to comment
Share on other sites

It's interesting that many of the new 32-bit equipment is lower-end prosumer gear. Perhaps for 8-series, Nova, and similar professional recorders there's something in the 32-bit chip sets that isn't quite up to what the engineers want. Like some minor A/D S/N thing, or some limiting factor on designing lovely pre-amps. So many the manufacturers are waiting for the next-gen chips...

 

I guess 32-bit stuff is fine, but I'm still rolling OK with 24-bit recorders. But then, I don't do sfx or sound design. And I could see buying a MixPre-class recorder.

 

Warning: I know nothing.

Link to comment
Share on other sites

The same discussion is on the Zoom Field Recorder Facebook group with the release of the Zoom F8n Pro. The only new 'feature' added is a dual stage conversion with 32bit floating point, and now the branding ist 'Pro' :)

 

 

GReetings

 

 

 

zoom1.jpg

Link to comment
Share on other sites

  • 2 weeks later...
On 2/26/2022 at 12:13 AM, Tom Craca said:

Same reason they don't put automatic transmissions in NASCAR.  If you're a PRO, then mix like a PRO.

 

I see this opinion pretty often, but I just can't agree.

 

You use 32-bit float for the same reason that I assume you don't go out of your way to disable the analog limiters on your mixers and recorders, or say that pro gear shouldn't have limiters. In case it's about to clip.

However, with the limiter, you lose the dynamics of the sudden yell, but with 32-bit float, you don't.

It's just a better alternative to a limiter. If you can do 32-bit float, you have no reason to be using a limiter anymore.

(Unless you're for some reason concerned about the extra 8 bits of storage usage pr sample, or the device has to disable other features to give you 32-bit float.)

Link to comment
Share on other sites

What is the point of 1528dB of dynamic range when the realistic range of recording conditions is somewhere in the neighbourhood of 140dB at the absolute max?  That would be like trying to capture the sound of your blood pumping in your veins in the same recording as a space launch.  There is simply no point to having that much dynamic range, especially given the inherent limitations of the numerous inevitable analogue devices.  I'm no physicist, but a "sound" in the 1,000dB range would have thermonuclear consequences.

 

In practical terms, I understand the usefulness of recording with more headroom.  We don't need 32bits for that.  We just need to record lower levels.  Which is effectively what Zaxcom's neverclip has been doing for a decade or more:  It basically records at a lower level, which provides more headroom.  We could accomplish the same thing with any recorder by recording at a nominal level of, say, -40dBFS rather than -20dBFS.  The only reason we don't is longstanding convention.

 

I would quibble about 32bit providing headroom "above" 0dBFS.  0dBFS stands for "0dB Full Scale" ... meaning it's the maximum level that can be represented in the audio format.  *By definition*, there is no headroom above that.  "FS" is the point of reference — basically is says 0dB is considered to be located at "Full Scale", meaning no louder sounds can be represented.

 

What's different with 32 bit is that, basically, hardware manufacturers have broken the longstanding convention that "0dB" on the VU meter is calibrated at (or close to) -20dBFS.  They have designed their circuitry so that the ADC (analogue to digital converter) represents +4dBu (i.e. 1 volt of analogue signal; nominally) gets represented at a lower level (-770dBFS according to Sound Devices) than previous convention (-20dBFS).  This does indeed make it practically impossible to clip *digitally* — but only because the analogue components in the chain will clip long before the ADC does.  The voltages required to clip a 32bit ADC would likely fry the rest of the electronics before it comes anywhere close to reaching 0dBFS.

The same happens in the context of a DAW.  Because manufacturers have broken the convention that -20dBFS = +4dBu, 32 bit files would be much, much quieter (functionally inaudible) if the DAWs played them back at the same 0dBFS reference as 16 or 24 bit files.  32 bit files effectively need 750dB of digital gain (again, relying on SD's numbers here) to be heard at "normal" levels, so this is what DAWs do when they play back 32 bit files.  And, ironically, sometimes this clips the 32 bit files, because adding 750dB of digital gain causes clipping when it creates signals that go above 0dBFS *in the DAW's internal processing*.

So ... why don't Pro recorders record in 32 bit?  Partly because they don't need it, and partly because the designs are relatively old.  They'll get there eventually.  And, in the meantime, if you want the advantages of 32bit without shelling out thousands for a new recorder, just turn the gain down and record at -40dBFS on your meters.  Don't forget to tell post first — they get very confused when they don't get files that follow the -20dBFS convention.

Link to comment
Share on other sites

20 hours ago, The Documentary Sound Guy said:

I would quibble about 32bit providing headroom "above" 0dBFS.  0dBFS stands for "0dB Full Scale" ... meaning it's the maximum level that can be represented in the audio format.  *By definition*, there is no headroom above that.  "FS" is the point of reference — basically is says 0dB is considered to be located at "Full Scale", meaning no louder sounds can be represented.

It seems we did not understand the same thing from the SD paper I shared.

I was thinking the same as you until I was given a demonstration and read this paper.

I understand that the 32bit float point gives another meaning to sound representation since the calculation by the converter is completely different. Yes louder sounds than 0dBFS cannot be represented, but they can be recorded. That is why it seems that when you import in a recent daw a 32bit float with a digital recording louder than 0dBFS, the meter will be stuck at 0 and the sound will be distorted until you apply gain attenuation.

20 hours ago, The Documentary Sound Guy said:

They have designed their circuitry so that the ADC (analogue to digital converter) represents +4dBu (i.e. 1 volt of analogue signal; nominally) gets represented at a lower level (-770dBFS according to Sound Devices) than previous convention (-20dBFS).  

I am not sure about that, in fact SD says differently: 

"There is one other aspect of 32-bit float files which is not immediately obvious. Files recorded with 32-bit float record sound where 0 dBFS of the 32-bit file lines up with 0 dBFS of the 24- or 16-bit file. Keep in mind that unlike the 24- or 16-bit files, the 32-bit file goes up to +770 dBFS. So compared to a 24-bit WAV file, the 32-bit float WAV file has 770 dB more headroom."

 

I have not experimented enough with it myself yet but in theory I agree 100% with San Jacobs here:

22 hours ago, SanJacobs said:

However, with the limiter, you lose the dynamics of the sudden yell, but with 32-bit float, you don't.

It's just a better alternative to a limiter. If you can do 32-bit float, you have no reason to be using a limiter anymore.

And you can keep your nominal level at -20dBFS 😁

Link to comment
Share on other sites

A lot of misunderstanding in this thread. Let me try to further muddy the waters. Or clear up a thing or two. 
 

32-bit float is not the same as Zaxcom‘s NeverClip. NC is employing two A/D converters. Each one is optimised to a specific dynamic range and they supposedly switch seamlessly at the crossover range. This is an extension I guess of a fairly old technique called gain-ranging. Neumann for example used this for many years in their now discontinued digital mic series. Zaxcom achieves 137db of dynamic range, IIRC, which is impressive, but still fine to represent by 24-bits (which can represent up to 144dB). 
 

32-bit float is absolutely not the same as pulling down gain by 40dB. Absolutely not the same. Likewise NeverClip, it is not the same as a 40dB gain reduction. 
By pulling the gain down 40dB you deliberately destroying your dynamic range, not adding to it and you’re getting very close to the system noise and that is exactly what you want to get away from. You want to present a healthy level to the a/d converter (and preamp) that’s why we gain up quiet bits and down the loud bits. So while gained up when someone suddenly starts to yell the audio slams into the limiter creating a very unpleasant effect or, worse, it will clip the converter and ruin the recording forever. That is where 32-bit float comes in. It will save your recording from clipping for those unexpected moments. That’s why it’s probably ideally suited for those situations where you cannot control the gain, but not really advantageous in those  where you can. 

As everyone stated the preamp and the mic will remain as the limiting factors if they clip the 32-bits won’t do you any good. You can probably help the preamp with an analog converter, but once the mic clips, there’s not much to do

 

Link to comment
Share on other sites

Thanks for fleshing out the details of Neverclip.  I debated whether to add that detail about dual ADCs, and decided it wasn't relevant.  But, nothing wrong with more information!

 

Now, I should clear up what I was trying to say, because I seem to have caused confusion.

I definitely didn't mean to say 32-bit float is the same thing as Neverclip.  What I meant is that that *goal* is the same:  Headroom.  And the solution is (partially) similar:  encode the audio so that a given signal voltages (let's use +4dBu aka 1V as our reference) are represented by a "lower" digital number.  With Neverclip, how much lower is user-selectable; on my Nomad, either -6dB, -12dB, or -18dB.  This means, if I'm metering at -20dBFS (as normal), my Neverclip recordings will actually show up at -26dBFS, -32dBFS, or -38dBFS in the 24-bit files I deliver.  I get more digital headroom, but at the expense of the expected correspondence between my metering and the level in the digital file.  With 32-bit files, the offset is more extreme:  +4dBu is represented as -770dBFS, but the basic idea is the same:  Create digital headroom by using a lower number to represent a given signal voltage.

 

Another clarification:  I was recommending gaining down *20dB* to achieve something similar to Neverclip (but without the advantage of gain-ranging with dual ADCs, which, as Constantin noted, is a real advantage of Neverclip).  A 20dB lower gain would mean recording at a nominal -40dBFS, which I think is where the confusion about recording 40dB lower came from).  Used judiciously, I think being 20dB closer to the noise floor to gain 20dB more headroom is a reasonable tradeoff.  It's reasonable because, in the past 20 years, most audio equipment boasts analogue dynamic range in the 100dB+ range, so recording at 40dB below clipping still leaves at least 60dB of SNR, which isn't going to audible as long as the rest of your gain is staged properly.  It's certainly erring more on the side of hearing the noise floor rather than clipping, but with (current) pro equipment, I think it's a reasonable compromise, especially given how frequently we distort our audio with limiters and the like to try and squeeze into our standard 20dB of headroom.

 

In my opinion, SD (and others) are abusing the definition of "Full Scale".  Full scale means the top of the scale; it means there's nothing above it.  +770dBFS is a nonsensical representation.  Presumably, somewhere in the spec for 32-bit floating point LPCM (which I looked for but couldn't find), there is information about how the digital encoding is supposed to represent the analogue input signal.  Based on SD's information, +4dBu / 1V is represented as -770dBFS (or, 770dB below the maximum value that can be represented in 32-bit floating point.

But the critical question here is not how much dynamic range can be represented by a 32-bit float encoding scheme.  The critical question, which should *definitely* be posed to Sound Devices, is how much dynamic range does their 32-bit ADC have?  This is a question I don't have an answer to, but it's definitely one we should be asking.  I can make an educated guess...

When Zaxcom started selling Neverclip, they were using top-end 24-bit ADCs that had about ~117dB of dynamic range (based on my memory, feel free to correct if you have real data).  Note that 24-bit audio can *represent* about 144dB of dynamic range, but the best 24-bit ADCs were not capable of using all the theoretical dynamic range in a 24-bit file.  A recording with no input signal would basically record the ADC's self-noise at -117dBFS, so the bottom few bits of a 24-bit file are effectively useless.  Neverclip was able to improve things a bit by feeding two ADCs, one at a 20dB higher level, and then recombining the two signals intelligently, which meant they could achieve an apparent ~137dB of dynamic range.
 

So, the question is, how good are 32-bit floating ADCs?  My guess is they are only a little bit better than the 117dB from ~20 years ago, but probably not that much.  The underlying technology hasn't changed much, and changing how the bits coming out of the digital side of the ADC are represented doesn't really impact the underlying circuit design, which is where the limitation is.  32-bit recorders may be able to *represent* 1528dB of dynamic range, but, until someone shows me otherwise, my electronic knowledge tells me the actual audio performance is likely in the same ~120dB range that we've had for the last few decades.  How that 120dB is represented, in 24-bit or 32-bit float, is immaterial.

With some experimentation, we could measure this.  Depending on the quality of the ADC and the rest of the analogue circuitry, I would expect to find a noise floor around -850dBFS, and an analogue clipping point around -730dBFS.  The exact numbers would depend on exactly what level SD's engineers have decided to feed the ADC, i.e. what reference level they've decided on for +4dBu / 1V.

Given the limitation of the ADC's dynamic range, and also given that 32-bit float recorders really do allow more analogue headroom above "0dB" on their meters, that tells me that they've effectively done something similar to "gaining down" to buy that headroom, so I would expect to see a similar impact on noise floor.

Based on this page from SD https://www.sounddevices.com/how-is-a-32-bit-float-file-recorded/:  The *actual* audio performance of the MixPre is 142dB of dynamic range, which they achieve through a "multi-stage ADC" — i.e. gain-ranging.  And, this source confirms what I though, which is that the best ADCs they could find come in at ~130dB ... or just slightly better than the ~117dB that Zaxcom was using 20 years ago).

Also notable (from Sound Devices' block diagram):  There is no analogue gain stage; they are feeding the ADCs directly from the pre-amp, and the "trim" stage is digital.

Link to comment
Share on other sites

Hmm, my understanding was that the shift to scientific notation allowed us to effectively, shift the 32bit window where ever we liked (post recording) within the 1500dB (odd) window. The ADC and microphone will absolutely be the limiting factor and could be done in 24bit, but that's not to say there isn't a clear benefit from not having to gain stage at all until you need to in post. There a quite a few scenarios where we've become pro's at getting the best from the technical limitations of the gear but it would be better not to have that limit or at least to have the option to bypass it if you want. In radio transmitters in particular, the ability to control the 32bit window on the recorder is a very very useful tool. Maybe it just fits well within my day to day (drama) workflow I guess but not trying to predict gain/performance/sfx explosions and setting levels accordingly is a clear benefit (esp with the patchy nature of the tx remote controls).

Link to comment
Share on other sites

11 hours ago, The Documentary Sound Guy said:

 

Worthy of note:  the 142dB of dynamic range that SD cites for their MixPre series still fits into the 144dB that 24-bit files are capable of.  There is zero no technical advantage from using 32-bit float here ... they could record in 24-bit and deliver the same audio performance.

 

Well, it's been some years since a former professor of mine forced us students to calculate A->D conversions by drawing funny trees of 0's and 1's and checking Least-Significant-Bits etc... But I'm pretty sure that the quantization noise actually makes a difference, technically. The AD-conversion in 24 bit adds a noise floor which is no big deal when recorded with a proper gain staging. The point of 32 bit float is that the gain staging has no effect on how close to the noise floor the signal is being recorded and wirtten into the file. So you can actually record the loud trombone after the whisper at the same gain stage in a 24 bit file without the trombone to clip the signal. But if you increase the whisper in post to make it audible you will increase the noise floor of the conversion which will probably be too close to the whisper-signal and you get a noisy recording. This is not the case with 32 bit float.

Of course there's still the limiting factors of mics and pre-amps but just to answer to your statement: there is a technical difference which is not to say that there are a lot of these extreme situations where you would actually need this advantage.

I remain open to be proven wrong of course.

Link to comment
Share on other sites

21 hours ago, The Documentary Sound Guy said:

 

Worthy of note:  the 142dB of dynamic range that SD cites for their MixPre series still fits into the 144dB that 24-bit files are capable of.  There is zero no technical advantage from using 32-bit float here ... they could record in 24-bit and deliver the same audio performance.

 


seems a bit like you are thinking of 32-bit fixed rather than float. It makes a big difference, as Sebi has outlined above. 
similarly the difference between 117dB and 130dB is not that it’s a slightly better, the difference is huge, it’s 13dB. That’s a lot. 

Link to comment
Share on other sites

38 minutes ago, Constantin said:


seems a bit like you are thinking of 32-bit fixed rather than float. It makes a big difference, as Sebi has outlined above. 
similarly the difference between 117dB and 130dB is not that it’s a slightly better, the difference is huge, it’s 13dB. That’s a lot. 

 

I suppose it's all relative.  It's a lot in linear terms, and it's definitely enough to be audible in a few situations.  Compared to the overblown hype around 1,500dB, it's not much.

I'd prefer to think in practical terms:  a 13dB increase in dynamic range from (say) 77dB to 90dB (as it might have been making the jump from Nagra to DAT) is a much bigger deal than a 13dB increase from 117dB to 130dB.  It's not nothing, but it's not very meaningful in day-to-day use.  There is a lot of latitude to screw up your gain structure at both 117dB and 130dB.  Is 130dB better?  Sure.  It means I have more margin for error and better technical performance.  I'm not sure I'd call it "huge" though.

Link to comment
Share on other sites

10 hours ago, Sebi said:

Well, it's been some years since a former professor of mine forced us students to calculate A->D conversions by drawing funny trees of 0's and 1's and checking Least-Significant-Bits etc... But I'm pretty sure that the quantization noise actually makes a difference, technically. The AD-conversion in 24 bit adds a noise floor which is no big deal when recorded with a proper gain staging. The point of 32 bit float is that the gain staging has no effect on how close to the noise floor the signal is being recorded and wirtten into the file. So you can actually record the loud trombone after the whisper at the same gain stage in a 24 bit file without the trombone to clip the signal. But if you increase the whisper in post to make it audible you will increase the noise floor of the conversion which will probably be too close to the whisper-signal and you get a noisy recording. This is not the case with 32 bit float.

Of course there's still the limiting factors of mics and pre-amps but just to answer to your statement: there is a technical difference which is not to say that there are a lot of these extreme situations where you would actually need this advantage.

I remain open to be proven wrong of course.

I'm equally rusty in my memory of quantization noise, so maybe I'm forgetting something as well, but am I wrong in thinking that when we talk about the dynamic range of a given bit length, they are talking about where the quantization floor is?  I.e.  doesn't quantization noise in a 24-bit file show up at -144dB?  (Actually, I think the precise estimate is 144.48dB of dynamic range for 24-bit integer encoding).

As long as I'm not misunderstanding that particular point, I don't see that quantization noise would be an issue as long as the ADC is fed at an appropriate level (i.e. analogue clipping is calibrated to 0dBFS output in the converter).  If the ADC stage has 142dB of dynamic range, the quantization floor should be 2.5dB below the analogue floor.  Summing the two floors might add a fraction of a dB to the noise floor, but it's hard to call that a significant effect.  It's a tight fit, so I suppose there could be a minor advantage to 32-bit if it lets the analogue clipping saturate without also clipping digitally (if you think a certain degree of analogue clipping is still usable), but it doesn't really change the fact that you still have 142dB of dynamic range to play with.  There's also the subjective aspect to think about; an analogue noise floor may sound less bothersome than quantization noise.  But again, we are talking a very minor difference.  If we are talking measurements, there should be no technical difference.

Link to comment
Share on other sites

Here's a genuine question (which I don't know the answer to) about 32-bit float:  How does the loss of precision affect audio?

Floating point numbers have a huge loss of precision at the top end of the scale.  The maximum number that can be represented in 32-bit float is approximately 3.402823467×10³⁸.  The next highest number (subtract 1 from the mantissa) that can be represented is approximately 3.402823465×10³⁸.  Approximately is important here, as is the fact that the difference between those two numbers is approximately 1.701411835×10²⁹.

That difference isn't strictly accurate — I ran into precision issues just trying to calculate it, and my computer gave me three different results depending on how many intermediate calculations I did — but all the results had the exponent 10²⁹.  That's a large number.  It means there are approximately 1.701411835×10²⁹ integers between the highest value and the second highest value that simply can't be represented in floating point notation. 

Does that matter?  I don't know.  But I'd like to.  I *suspect* that because dB is a logarithmic notation, the absolute precision doesn't matter, and an audio file with, say, 770dB of digital gain applied and then removed would end up largely (but not exactly) the same as how it started.  But, at the very least, there would be floating point rounding errors involved, and, pushed to extremes, you probably wouldn't end up with exactly the same binary representation.  Floating point error is a known factor, and I forget the math that tells you how much precision you can be guaranteed, but it's less than the full 24-bits that can be represented in the mantissa (23-bits literal, plus an assumed leading 1).

I *suspect* that loss of precision would show up as quantization noise, reducing the real-world dynamic range that is reproduced.  A 1-2 bit loss of precision would translate into a 6-12dB higher noise floor.  And given that the precision is dictated by a 23-bit mantissa (which can represent 24-bits), I *suspect* that the real-world noise floor of 32-bit float is slightly higher than 24-bit integer.

 

That needs some unpacking, because the straightforward brag is that 32-bit float offers 1,528dB dynamic range.  But, I *think* what it actually means is that a signal can be represented along a 1,528dB scale, but at any given point on the scale, precision would limit the quantization floor to 24 bits minus the precision loss, or a best case 144dB in a case with no floating point error.  A signal at 0dBFS would see a quantization floor at -144dBFS, despite the fact that the format is capable of representing signal below that floor.  A signal at -700dBFS would see a noise floor around -844dBFS, and so on.

That's a lot of assumptions on my part.  I may be wrong about all of this.  But, for me, it raises the question, why floating point?  Why not just jump to 32-bit integer audio, which can represent 192dB of dynamic range with no worries about precision error?  That would offer the same practical benefit:  More margin for error around headroom and quantization floor, and would avoid any worries around precision.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...