Other than here, I have only a few other forums I frequent. One is TDPRI which is all about guitars and amps and tech stuff. And lots of old men with opinions haha! Kind of toxic for some folks but I ignore the dummies and occasionally come upon insanely awesome threads. So here is one and I want to share it here. Most of you know this stuff, but I didn't. I was always curious about how many volts a line level vs. mic was!
Here is the link to the thread. I contacted the OP and was granted permission to repost this here.
And here is the stuff that blew my mind. Its focus is music recording and producing, but the science and tech are germane to our work. Enjoy!
This is in the weeds stuff that might make no difference to your recordings, but if it helps anyone or is of interest, then great! If not, well I needed something to do today to kill some time while I wait for an artist to show up for a session.
Probably a good place to start is: What is a dB. The decibel scale is a logarithmic scale that compares the amplitude of something to a known reference. By itself, a decibel is meaningless. It needs a reference level, which is why you see things like dBSPL at a known distance, which tells you how much sound pressure you have compared to a reference level at a given distance (often 1 meter) so you can make like for like comparisons. Or dBu, the amplitude of a voltage compared to a reference voltage of 0.775 volts in an unloaded circuit (hence the “u” in dBu). If you want to get really into the weeds 0.775v is the reference because that's what delivers 1mW in a telephone circuit with a standard 600ohm impedance, and most of this audio stuff traces back to the invention of the telephone by Bell.
The first important scale in a Digital Audio Workstation is dBFS, decibels Full Scale.
This is a scale on your DAW meters, that sets the largest signal (or number) that can be represented in the digital system as the reference at 0dBFS, and goes down from there, it can only go down because you can't have a number larger than the largest number the system can represent.
If you think of digital audio as a water bottle that can hold exactly 1 liter of water, anything above 1 liter spills over the side and is lost, anything less than a liter leaves some space at the top of the bottle for more. In digital audio any signal that goes above the largest amplitude that can be described (0dBFS) is simply cut off, and as a result you get hard clipping and those nice sounds you made are not perfectly described so you get digital distortion . Anything below 0dBFS leaves space at the top of the meter for more.
There is technically no headroom in digital audio, 0dBFS is the hard limit. Some DAWs get around this internally by switching to floating point math and shifting 0 on the DAW meter further down a larger scale. This means you can go above 0 on the meters in the DAW and not hear clipping because you are not really at true 0dBFS. But, at the point of recording through your fixed point converters, or when you need to send that file to be rendered as a fixed point file to go to your converters that use fixed point math for playback, anything above 0dBFS clips and distorts.
So, for the purposes of recording and mixing, you need to create your own headroom in a DAW by recording well below 0dBFS. By headroom we typically mean space for unexpected overs like over enthusiastic drum hits, or sudden loud words in a vocal, or a really hard pick of a string in a guitar performance that are well above the level of the rest of the recording, and would otherwise clip your signal if you are working too close to 0dBFS.
OK, Phew, so that's dBFS introduced, but where do those signals come from?
Analog signals, so called because we take a sound wave, smash it into a microphone diaphragm which, through the magic of moving magnet coils, turns that movement created by the sound into an AC voltage that is analogous to the original sound wave. We can now push that AC down a wire through a bunch of equipment that will end up giving it enough power to drive a speaker to recreate that sound wave. (Fun fact, you can use a speaker as a mic by wiring it into the input of a preamp and yelling into the cone. This is how a lot of early Rap/Hip hop was done on a budget, and what caused Yamaha to come up with the sub-kick for drums, after seeing lots of recording engineers using NS-10 monitors as a low frequency "mic" on kick drums.)
Ok but a passive mic with a really thin diaphragm moving minuscule fractions of an inch doesn’t make a whole lot of AC right? No probably less than 1 millivolt RMS which is nowhere near enough to do anything with. So we need to amplify it to a usable level with a pre-amp.
The dB scale is not linear so +2 dB doesn’t make something 2 x louder than +1dB. Instead, amplitude roughly doubles with every 6dB you add or halves with every 6dB you remove. So +6dB has twice the amplitude, +12dB four times, +18 dB eight times and so on. This is why preamps have a lot of gain. 60dB of gain is fairly common, and increases the amplitude of the incoming signal by a factor of 1,000. So that just less than 1 millivolt RMS from the microphone becomes just less than 1 volt RMS.
So how much voltage do we want? The professional audio standard for “line level” is +4dBu. 0dBu as mentioned earlier is 0.775 volts RMS so you need to add 4dB to 0.775 volts. You can either pull out your log tables and do a lot of very complicated math, or just accept that +4dBu line level = 1.228 volts RMS. I usually think of it as 1.23volts because it’s easy (easy as 1,2,3) and close enough on a multimeter.
This is why dynamic mics are considered gain hungry. Even at 60dB of gain, you are probably not getting all the way up to line level so we have things like cloud lifters for an extra boost, or preamps with 80dB of gain which can give us more ability to increase the amplitude of the signal to line level and well beyond. Condenser mics use powered, active circuitry to allow the mic to produce more signal and so need less gain and are more sensitive.
Thankfully, someone realized a long time ago that looking at voltage meters and trying to do a lot of math in your head was not a great way to monitor audio levels and so they came up with Volume Units or VU. 0VU was calibrated to line level (+4dBu = 0VU) and you could now see how many decibels above or below this reference level you were. Typically they stop at +3dB VU and the clip light comes on if you exceed +6dB VU.
As an accurate metering device, VU meters are terrible. You have a physical needle with mass, whose inertia you have to overcome (meter ballistics). As a result, it takes a VU meter about 300 milliseconds to react to an incoming signal. So for things like drums they are not going to give you anything like an accurate impression of the levels. When I used to set up levels for drums with analog gear, with snares for example I’d aim for -5VU because that would actually give me somewhere between 0VU - +3VU peak level. Kicks I’d go even lower at around -7VU.
But it turns out those laggy meters are actually a pretty good analog of how we perceive loudness, and so are still a good way of gauging how loud something sounds to a human person here on planet earth, not how loud a peak hit is for 0.01 seconds. So they are still useful today for gauging perceived loudness and giving you a rough idea of where your signals are in relation to line level.
OK. So we know 0dBFS is a hard limit but that analog VU meters go to +3 above zero, and the clip light doesn't come on until you hit +6VU, so what's that all about?
Gear makers know 0VU is an average target level, and since it is a standard level, they can design circuits and gear to perform optimally at that level. So best signal to noise ratio, low noise, low distortion at line level.
They also know that things happen and you get overs above the average level, so they build gear that can handle signals above line level too. Sometime a lot higher, like max input/output levels 20dB above line level higher.
Recording engineers, being of inquisitive dispositions, started to wonder what happens if you push the levels. It turns out when you feed input transformers, tube gain stages, output transformers, magnetic tape, etc higher levels than intended for optimal performance, you can get saturation, soft clipping, added second-order harmonics, fattening, added sheen, rounding of transients etc, as well as sometimes just bad sounding distortion. All kinds of unicorn sprinkles, elf dust and leprechaun farts (happy St. Patrick's Day) you could use for effects as needed.
And so people got comfortable with, and used to pushing signals beyond 0VU as needed to get the sounds they wanted, when needed.
Then along comes digital and someone made a massive error in judgement. They set the calibration of 0 in the DAW at a point that is a hard limit you can't exceed, and turned over DAWs to a bunch of delinquents, I mean recording engineers, who frequently push beyond zero to get their sound. So they tried it with digital and everything clipped and sounded awful and now digital is hated, and sh!try and doesn't work right.
So the next question became OK, so what is the standard, where is 0VU on a digital meter? The answer, there is no standard. It's wherever the people making the converters set it and it's not consistent from one brand/model to the next.
Since ProTools was something of a standard, people looked to Digidesign units, since that's what you had to use to run PT back then. Their converters were set so a line level signal of 0VU showed up at -18dBFS in a DAW. So for a while that became quoted as gospel "you should be recording at average levels of -18, that is 0VU". Even Waves set simulated 0VU on their analog emulation plugins at -18 dBFS. It wasn't a standard though. A lot of converters didn't show up at -18dBFS at line level. RME were either -9 or -15, Burl were user definable, entry level gear could be as high as -6 sacrificing headroom to try and keep unit self noise acceptable at a low price. And so on, and so on.
Fast forward to today, there is still no standard. Converter calibrations are still all over the place, and the DAW recording level recommendations on the great and powerful internet seem to shift to a different 0VU = xdBFS level every couple of years.
My personal philosophy is to read the specs of my unit, then test to confirm with a signal generator where a line level signal appears in my DAW meters. If it's at a reasonable level to give me enough headroom to catch overs and/or to push the levels in my analog gear in front of the converters without momentary peaks causing clipping I'll use that level as my reference for 0VU. If it didn't have the headroom I'd default back to the old digidesign -18dBFS for the average reference level. I don't worry about peak levels, so long as they don't hit zero they won't clip and I can deal with them as needed later.
Alright, so now we know all about analog and digital levels, in far more detail than we ever wanted to, but what’s all this about sample rates and bit depths?
CD quality audio (What is a CD? Does anyone even buy those?) is 16/44.1. That means it has a bit depth of 16 bits and a sample rate of 44.1k (44,100 samples per second). This is a fairly arbitrary standard that was seemingly chosen for a couple of reasons. First audio files at 16/44.1 are small enough that you can fit the whole of an average length album on one disk.
The second thing is because of something called Nyquist Theorem. Nyquist Theorem says a lot of very sciency and mathy stuff, but for you, me, and the guy rushing home from guitar center with a shiny new interface, the point is that the highest frequency sound that can be converted is equal to half the sampling rate. So if we are using 44.1k sampling rate the highest frequency your converters can deal with is 22.05kHz. It just so happens the very upper limit of human hearing is typically around 20kHz (if you happen to be a newborn, not yet exposed to any hearing loss) and audio equipment typically has a frequency response of 20Hz - 20kHz. So that all works out nicely. It also gives you 2050Hz above the audible spectrum in which to hide the anti-aliasing filters needed to deal with the nasty sounding artifacts of AD/DA conversion.
What about the 16 that goes in the front of 16/44.1? That is the bit depth. For our purposes, the bit depth is the amount of dynamic range the digital system can describe above its digital noise floor. 1 bit = 6dB of dynamic range, so in 16 bit audio you can have signals from 0dBFS all the way down to -96dBFS where the digital noise floor is (16 bits x 6dB per bit = 96 dB dynamic range). 96dB of dynamic range sounds like a lot (and it is) but the problem is you can’t record at 0dBFS your actual levels are going to be quite a lot lower than that. If we take an RMS (average) recording level of -20dBFS, that means the bulk of our signal is around 76dB above the digital noise floor, that performance is quite a lot worse than even mediocre analog gear noise levels .In fact your digital noise floor is probably higher than the noise floor of the analog gear that is feeding it.
So when you compress that track and bring it up with other compressed tracks, and then compress the whole mix and bring up the level, and then send it over to mastering for limiting/compression and other magic sprinkles and finally bring the whole mix up to where the peaks are at -0.1dBFS and the average RMS level of the mix is maybe -9 or -10dBFS, the noise performance is worse than a comparable record made on analog gear, and the noise floor may be apparent on the quieter bits of the recording. This is where the mantra that for 16 bit audio “You need to record as hot as possible without clipping” came from.
Luckily, technology evolved and we got 24bit audio capable converters. 24bits x 6dB dynamic range = 144dB of THEORETICAL dynamic range above the digital noise floor. Why theoretical? Because your analog noise floor is a lot higher than -144dBFS. Under perfect conditions where the only thing in your recording chain is the converters, on conditioned power with no RMF, no noise in the grounding, etc., even really good, really, really expensive converters have <130dB of dynamic range. The lowest 2-3 bits of your 24 bits are just catching thermal/circuit noise from the analog front end of the converters themselves. If your gear skews more to the entry level stuff you’re probably getting <110dB of dynamic range from your converters, the 5-6 lowest bits are just letting you pickup the circuit noise from your interface.
Psst. If you're recording at 24bit on an entry level interface, you’re only getting 18 usable bits. If you're on a super high end system you may be getting 20-21 bits. It doesn’t matter. Here’s why:
If you record at sensible levels in 24 bit, your analog noise floor is well above your digital noise floor. Even recording at -20dBFS average in 24 bit, your noise floor is likely comparatively lower than if you recorded at -10dBFS average in 16 bit with the higher digital noise floor. So now when you compress and raise and compress and raise, and send to mastering to limit and raise, you get a noise floor that doesn’t intrude badly, even though you weren’t able to use all of the bits.
Now we're armed with some info about bit depth and sampling rate, the question of why does my interface have so many different sampling rates available raises it's ugly head. I say ugly head because I've seen arguments about sample rates on Gearsapce go on for years and Grammy winning engineers almost coming to blows over it in threads. So I'm going to caveat this with: What I say here is my experience of recording things for a living.
That said, interfaces have two base clock speeds that can be run at single, double or quad speed. The base speeds are 44.1kHz and 48Khz. If you run at double speed you get 88.2kHz and 96kHz, and if you run at quad speed you get 176.4kHz and 192kHz.
Why are the base levels 44.1 and 48kHz? Because CD audio standard is 44.1kHz and Video audio standard is 48kHz.
So you know where I'm coming from: I record everything at 24bit 48kHz unless a client specifies they want something other than that, Why? Because if there is any chance anything I record is going to end up in the wonderful world of TV, video, movies etc., that's what those guys want. Since I record voice actors who narrate stuff for TV and video and signed bands or bands working to get signed, who are probably going to license stuff to film, tv shows/commercials, there is a very good chance the recordings are going to end up in video land, so 48kHz it is.
Warning IMO/IME content.
I personally cannot hear any difference between 44.1, 48, 88.2, 96kHz, etc. sample rates. I could not sit down and listen to a song and say to you with anything other than a guess what sample rates had been used in a recording. 44.1Khz can do the math to perfectly describe a sound wave with a frequency all the way up to 20Khz. Computers are very good at math.
So why do interfaces have all these other sample rates then?
That's where things get contentious.
Some of the more reasonable arguments for it are:
File size, Audio files recorded at 24/192 are quite large. Less of a concern with bigger hard drives and faster upload/download speeds now.
Latency, buffers fill up faster at higher sample rates and latency may be lower. Of course the amount of data you stream at higher sample rates is higher too so puts more demand on your system so you may have to increase the buffer. It's a balancing act if you're not on a system with fixed monitoring latency.
higher sample rates mean you can use shallower anti-aliasing filters above 20kHz that are less likely to produce artifacts in the audible spectrum. Makes sense, a filter you cram into the 2kHz above the audible spectrum in a 44.1k sample rate is going to have to be much steeper than a filter that can run, shallowly (is that a word?) over 24Khz above the audible range at 88.1k sample rate and so on.
Steep filters can cause a bump above or below their corner frequency that can cause artifacts. A low pass filter with a slope of 48dB per octave centered at 100Hz, for example, will produce a noticeable "bump" in the EQ just above 100Hz. I can honestly say I've never heard heard aliasing artifacts on a recording at 44.1 or 48kHz, on modern equipment, but that doesn't mean it's not possible.
Higher sampling rates mean your DAW can do the very complex math inside with bigger numbers and not have rounding errors. Also makes sense, figuring out the area of a circle using pi = 3.142 will give a slightly different answer than using a bigger version of pi = 3.141592653. Again, I can't point to an example of when I could listen to a recording and pick out rounding errors but it doesn't mean it couldn't happen.
Some plugins use oversampling to do more intense math on the signal to give you a "better, more accurate sound" so you might as well use a higher sample rate so they can work natively without having to up sample. Basically the same argument as above, a longer number means less rounding errors. Again, I personally can't hear any reliably repeatable difference between a plugin up-sampling audio to work on it vs working at a higher sample rate natively, but that doesn't mean it's not possible.
Some examples of the more tin-foil hat (IMO of course) arguments I've seen are:
It's all a marketing ploy to get people to buy new interfaces. You don't need anything above 48kHz but bigger numbers sound impressive/more high quality and gear makers need to keep selling gear.
Higher sample rates give you a more detailed recording. The math suggests that they allow you to capture higher frequencies above what you can hear or what your equipment produces, I can't see anything in the math that would suggest that the detail in the audible spectrum is better at higher rates, but I'm no mathematician. I can't reliably hear a difference though.
Sort of similar to the one above is that all the nuance above the audible spectrum can be picked up by analog gear and it affects what you can hear in the recording. My converters cut everything above 20Khz off (I've tested this) so even if my mic picks up some kind of super harmonics at 35kHz I wouldn't be able to record it. There are scientific instruments that could but I don't have any of those and I've never seen them in a recording studio.
I guess the logical way to bring this whole meandering walk through levels, is to think about how loud your final song for release should be.
Before we delve into that, I would suggest not mixing to a number. For sure you need to check levels and be certain you are not clipping the final outputs, but work for a mix that sounds good and let the levels in general fall where they may. If your final mix is way off from distribution levels when you are done, you can "Master" it or even re do it
Ok so context. If we go back to vinyl: There was kind of a ceiling on how loud you could go before the bumps inside the grooves of a record would get big enough to actually kick the needle out of the groove, especially on mono, low end sounds like bass and kick. That was ok though, the amplifier usually had a great big volume knob on it. Record players were not very portable and sitting down to listen to something was a decision, and the listener wasn't offended by having to reach over and adjust the volume.
CD/Digital Audio eliminated the problem of kicking the needle out of the groove. People took note that louder sounds better and so started to more heavily compress and limit final mixes. This brought average levels up and dynamic range down. And so began the loudness wars. At its worst, we were getting mixes with like 6dB of dynamic range.
Thankfully, streaming services took note that their listeners didn't enjoy suddenly getting deafened by extremely loud mixes when listening on headphones and have taken action.
TV/ broadcast stuff already had standards using LUFS. This stands for loudness units with digital Full Scale as the reference. Loudness units are a way of averaging perceived loudness based on how we hear things. Momentary LUs are measured slower than a VU meter and LU can be measured over an entire recording. So that's good news, except unlike TV/broadcast/Film, there isn't a standard for music streaming yet.
Spotify uses -14LUFS with a max true peak of -1 dBTP and will turn you down if you are louder and limit and turn up if you are quieter
Spotify loud uses-11LUFS max -2dB True Peak
Amazon music use -13LUFS Max -2dB True Peak. they turn you down if you are too loud, but they don't turn you up if you are below their standard.
Apple uses -16LUFS max peak -1dB True Peak and will adjust up or down if you don't meet their level.
YouTube is -15LUFS with a max of -1dB True Peak.
These are all subject to change without notice by the streaming services at any time. Plus there are dozens of other streaming services with dozens of standards that vary from -18LUFS to -9LUFS.
If you don't have LUFS and True peak meters in your DAW, there are plenty of plugin meters to choose from.
So what do you do? Make a mix you like and let the streaming services turn it up or down? Pick a mid point LUFS level and accept that some services will add a limiter and turn it up (which might change the sound a little)? Research where most of your audience are and mix to that service's level?
Well that is entirely up to you!