Jump to content
Sign in to follow this  
Jay Rose

Laurel v Yanni explained!

Recommended Posts

New York Times has an excellent technical explanation of this week's acoustic 'blue dress / gold dress' meme, with an interactive tool that lets you simulate different listening/auditory conditions. 

 

 

The original issue is, I believe, due to a badly implemented speech synthesizer. But there's a lesson in there for all of us, about the importance of accurate recording / equalization / monitoring. 

Share this post


Link to post
Share on other sites

Jay your post couldnt be more perfectly timed. Our green screen shoot came to a screeching halt when this subject came up followed by "of course the sound guy would find this". Clients and crew equally entertained seeing where they fall on the scale. A pleasant distraction!

 

-Ken

Share this post


Link to post
Share on other sites

Dalton, it sure is.

 

Unlike Shepard Tones and MaxxBass and stuff like that, though, this seems to be an issue of language perception... and how unexpected combinations of formants can drive the mechanism crazy,  The spectrograms are particularly revealing.

 

Thought experiment: Today's audio tech relies on masking for things ranging from mp3 to Nielsen radio ratings to Dolby Digital. But is it a psychoacoustic phenomenon or an chemical/time characteristic of the mechanism? Does it happen in the mind, or in the impulses that get to the mind? 

Share this post


Link to post
Share on other sites
15 hours ago, Jay Rose said:

Thought experiment: Today's audio tech relies on masking for things ranging from mp3 to Nielsen radio ratings to Dolby Digital. But is it a psychoacoustic phenomenon or an chemical/time characteristic of the mechanism? Does it happen in the mind, or in the impulses that get to the mind? 

 

Jay, I read an interesting article that touches on this and the biological voodoo mechanics behind who hears what and why..... https://apple.news/ASs4UNdVSRjCTSDwsYHoPww

 

Cheers,

Evan Meszaros

Share this post


Link to post
Share on other sites

Hobbiesodd, thanks for the link. It's an interesting article.

 

I take issue with the reporter's claim that the file actually says "Laurel". According to the spectrogram, what it says is what one particular (IMHO malformed) piece of software synthesized for that text... not the sound a human would (or even probably could, without re-engineering the vocal tract) create when speaking. 

 

But I'm glad Wired and the NYTimes (and late night comedians) are calling attention to actual speech science. Phonetics is, as far as I'm concerned, both a theoretical basis and a fast, efficient strategy for dialog editing.

Share this post


Link to post
Share on other sites
50 minutes ago, Jay Rose said:

Hobbiesodd, thanks for the link. It's an interesting article.

 

I take issue with the reporter's claim that the file actually says "Laurel". According to the spectrogram, what it says is what one particular (IMHO malformed) piece of software synthesized for that text... not the sound a human would (or even probably could, without re-engineering the vocal tract) create when speaking. 

 

But I'm glad Wired and the NYTimes (and late night comedians) are calling attention to actual speech science. Phonetics is, as far as I'm concerned, both a theoretical basis and a fast, efficient strategy for dialog editing.

 

I think what they are referencing is that the original source file was a computer pronouncing the word “Laurel”. So, I guess it is, technically, “Laurel.”

 

Cheers,

Evan

Share this post


Link to post
Share on other sites
9 hours ago, hobbiesodd said:

I think what they are referencing is that the original source file was a computer pronouncing the word “Laurel”.

 

That's precisely my point. The computer wasn't pronouncing "Laurel". It may have been asked to do that, but it was pronouncing a jumble containing parts of both words. 

 

 LauralLauralYanni.jpg

This spectrogram is, left to right, the computer's pronunciation (courtesy of NY Times, which I'm pretty sure they got from Twitter), then a human saying Laurel and Yanni. The human is me, at my desk just now, wearing an  E6 with no processing. The clips were pretty much normalized (as ref the fundamental ~ 100Hz).

 

Note how the computer's Laurel has stuff around 2k - 3k that isn't in the human version but is in the human Yanni. Note also some Yanni-like harmonics (formants) in the computer's Laurel ~ 500 Hz. 

 

I'd guess the student who discovered this actually found a bug in the speech synthesizer, but a fascinating one in terms of speech perception. Of course I can't guarantee it's a bug... it might have been a feature, or possibly a mistake in the dictionary software that was calling the synthesizer.


 

Share this post


Link to post
Share on other sites

UPDATE: I just downloaded the NPR story, and Jones is on a studio mic (either at NPR or via ISDN; it doesn't matter). So I'll process the clip and post.

I'm leaving my original request up for completeness.

 

 

Hi, all. This is a request for help...

 

NPR and some other media are reporting that the laurel/yanni voice is Jay Aubrey Jones, who was an out-of-work actor when Dictionary.com first set up, and took a gig saying thousands of words for them. (Interestingly, the NPR story says he didn't recognize his own voice, and had to be told by a Dictionary.com producer that it was him. Which possibly suggests he was replaced by a speech synthesizer, which would be a lot more efficient on their servers... but I can't blame the site for wanting to milk the story and create more buzz. It's been very good for them.)

 

In any event, NPR hasn't posted audio yet. Nor has anybody else. 

 

What I'm looking for is Jones being interviewed now, and inevitably asked to say "Laurel" for the mic. Then I can isolate it, normalize to my other samples, and compare his spectra with the meme version. It would be very interesting, particularly since I can't even imagine how an American English speaker would create the combination "Laur / Yann" syllable. Too many different simultaneous resonances. 

 

If you can find a clip, please post it here or email me with it. 

 

If the Jones interview is via POTs line, we should still be okay: a lot of the ambiguous energy is under 3.5k. I'll treat the meme to match.

 

If the Jones interview is via a modern cell, it might be dicey... most cells now use vocoding, and I don't know if the number of formant channels is limited. We'll see. I do know that modern cells go nuts if they try to carry two people saying different things simultaneously.

 

Anyway, if I can get the sample, I'll post the results. Maybe we can shut down another shoot like osa did. 😊

 

 

Edited by Jay Rose
update

Share this post


Link to post
Share on other sites

Okay. I got today's recording of Jones saying "Laurel" and "Yanni". In fact, he said "Laurel" twice... once as an announcer, and then again in conversation.

 

The results are what... ahem... I said earlier in this thread: It's not a faithful representation of Jones' (or anyone's) voice. It's a non-human distortion added either by Dictionary.com's compression, or by their speech synthesizer (which if they're using, they do seem to have trained with Jones' recordings).

The proof is in the spectrogram. First, the ambiguous "Laurel/Yanni" from Dictionary.com, as widely reported. Then Jones' studio interview on NPR, where he says "Laurel" as a narrator... then also says both words in conversation during the interview. 

 

laurelYanniDecoded.jpg

 

The ambiguous computer version has strong activity above 1.5k, which doesn't appear in either of the Jones "Laurels". There's a bit of activity up there in his "Yanni", which is normal... but nowhere near what the computer is doing.

The computer also has very little around 200 Hz, even though it's present in both Jones' "Laurels" and not in his "Yanni". 

Bottom line: the computer version has some sonic characteristics of both words, and is missing others. The brain is left with no alternative but to guess what the actual word is.

And anyone who claims that the Dictionary.com version is supposed to be Laurel is wrong... as demonstrated by the actor they claim recorded it.

 

Attached is the file I used to generate that spectrogram. Feel free to try your own experiments.

DictComJonesNPR.flac

Share this post


Link to post
Share on other sites
On 5/18/2018 at 4:33 PM, John Blankenship said:

More proof that reality's not what it's cracked up to be.

 

 

What if it's not an auditory effect at all, but actually a demo of an internet meme, and the sliding scale is highly manipulated behind the scenes and the word does actually change depending on how many times you've used it, and what direction you're coming from on the scale, in which case it's succeeded and really got us all going.....:)

Share this post


Link to post
Share on other sites

Well heck, if we're doing conspiracies:

 

What if Dictionary.com deliberately munged Jones' original recording through a vocoder, and told the high school student to try it, as a way to build buzz and brand recognition!

 

Or what if the CIA secretly replaced every computer audio app with dual-channel capability, and both are being broadcast. Then a randomizer in the client computer determines what mix will be sent to the speakers!

 

Or what if... it's... ALIENS!!!*

 

---

* That would explain the extra head resonances that don't exist in a human.

Share this post


Link to post
Share on other sites
21 minutes ago, Jay Rose said:

Well heck, if we're doing conspiracies:

 

What if Dictionary.com deliberately munged Jones' original recording through a vocoder, and told the high school student to try it, as a way to build buzz and brand recognition!

 

Or what if the CIA secretly replaced every computer audio app with dual-channel capability, and both are being broadcast. Then a randomizer in the client computer determines what mix will be sent to the speakers!

 

Or what if... it's... ALIENS!!!*

 

---

* That would explain the extra head resonances that don't exist in a human.

 

Yeh.....:)

Share this post


Link to post
Share on other sites
11 hours ago, Jay Rose said:

Well heck, if we're doing conspiracies:

 

 

Someone (but not me) suggested that it’s a covert gov’t mind control experiment— wherein the file has an embedded hidden track and they are using social media to test the efficacy of this potential auditory subliminal message delivery system...

 

Someone (DEFINITELY NOT ME) sure has some control/trust issues. 

 

Cheers,

Evan

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×