Jump to content
Sign in to follow this  
Ty Ford

Why wireless telephone audio is wonky.

Recommended Posts

I just read the following, "Modern telephony sends your voice using encoded phonemes, adjustment coefficients and corrections for a virtual voicebox which tries to mimic you at the other end. Actual raw audio is not sent. This is also the reason why hold music can sound pretty weird sometimes."

 

Does anyone know if this is true?

 

Share this post


Link to post
Share on other sites

https://jwsoundgroup.net/index.php?/topic/26816-the-secret-history-of-the-vocoder/&do=findComment&comment=309570

Share this post


Link to post
Share on other sites

I have very little knowledge about this but I've heard that with cellphones the voice you hear is modeled in 10 ms increments . A quick search revealed this statement from a research paper that makes the point rather well. 

Quote

The parameters that are actually sent from one cell phone to another are vocal tract coefficients

related to the frequency response of the vocal tract and source coefficients

related to the residual signal.

The fact that the vocal tract coefficients are very much related to the

geometric configuration of the vocal tract for each frame of 10 ms of

speech calls for an important conclusion: cell phones, in a way, transmit a

picture of our vocal tract rather than the speech it produces.

It seems logical that music reproduction would  suffer a great deal in this scenario.

Share this post


Link to post
Share on other sites
1 hour ago, Ty Ford said:

so, not just a lot of data compression, real transformation!  That's sort of scary.

 

My understanding is that both start with A2D conversion and limit the bandwidth but while "regular" data compression algorithms ( mp3, etc) rely on psychoacoustics and bandwidth limitations,  in cellphones they use LPC (linear predictive coding) which starts with hard bandlimiting ( <3KHz, just like POTS) but then removes those elements of speech which it can express/ transmit in a much reduced dataset, sending only the residual audio (plosives, sibilance, consonants etc) as actual digital audio. On the receiving end the residue and the data about the formants are synthesized into speech.

The engine driving speech production is represented as an acoustic tube and a buzzer and can be regenerated at the other end if the modifiers (throat shape, etc) are known.

 

I apologize for this sketchy, limited explanation but I'm trying to wrap my mind around this as well.

 

It kind of reminds me of generating room tones using IR. I do this in post a lot when roomtone isn't available for one reason or another .I use a little snippet of dead space between words of a clip of dialogue to generate an impulse response and then feed white noise into that IR loaded into an IR reverb. The white noise is the engine driving the synthesis of room tone and wouldn't need to be transmitted, it could  just be generated during reproduction so 1 hr of roomtone could be expressed with very little data, the IR and the metadata describing the level of white noise. I hope this makes sense.

Share this post


Link to post
Share on other sites

is this, for lack of better description, the "talking thru a paper towel tube" sound that happens even though both parties are talking directly into phone (not speaker phone) with full bars? that drives me nuts spending most of my time trying to figure out what someone is saying in all the low pass sounding mud that seems to phase in and out. my wife and i started using whats app to talk on the phone and the voice quality is night and day. at first i thought it was android vs apple devices, att vs verizon services. but a virtual voice box makes sense why this keeps happening and is accepted as the norm.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...