Jump to content

Laurel v Yanni explained!


Jay Rose

Recommended Posts

New York Times has an excellent technical explanation of this week's acoustic 'blue dress / gold dress' meme, with an interactive tool that lets you simulate different listening/auditory conditions. 

 

 

The original issue is, I believe, due to a badly implemented speech synthesizer. But there's a lesson in there for all of us, about the importance of accurate recording / equalization / monitoring. 

Link to comment
Share on other sites

Jay your post couldnt be more perfectly timed. Our green screen shoot came to a screeching halt when this subject came up followed by "of course the sound guy would find this". Clients and crew equally entertained seeing where they fall on the scale. A pleasant distraction!

 

-Ken

Link to comment
Share on other sites

Dalton, it sure is.

 

Unlike Shepard Tones and MaxxBass and stuff like that, though, this seems to be an issue of language perception... and how unexpected combinations of formants can drive the mechanism crazy,  The spectrograms are particularly revealing.

 

Thought experiment: Today's audio tech relies on masking for things ranging from mp3 to Nielsen radio ratings to Dolby Digital. But is it a psychoacoustic phenomenon or an chemical/time characteristic of the mechanism? Does it happen in the mind, or in the impulses that get to the mind? 

Link to comment
Share on other sites

15 hours ago, Jay Rose said:

Thought experiment: Today's audio tech relies on masking for things ranging from mp3 to Nielsen radio ratings to Dolby Digital. But is it a psychoacoustic phenomenon or an chemical/time characteristic of the mechanism? Does it happen in the mind, or in the impulses that get to the mind? 

 

Jay, I read an interesting article that touches on this and the biological voodoo mechanics behind who hears what and why..... https://apple.news/ASs4UNdVSRjCTSDwsYHoPww

 

Cheers,

Evan Meszaros

Link to comment
Share on other sites

Hobbiesodd, thanks for the link. It's an interesting article.

 

I take issue with the reporter's claim that the file actually says "Laurel". According to the spectrogram, what it says is what one particular (IMHO malformed) piece of software synthesized for that text... not the sound a human would (or even probably could, without re-engineering the vocal tract) create when speaking. 

 

But I'm glad Wired and the NYTimes (and late night comedians) are calling attention to actual speech science. Phonetics is, as far as I'm concerned, both a theoretical basis and a fast, efficient strategy for dialog editing.

Link to comment
Share on other sites

50 minutes ago, Jay Rose said:

Hobbiesodd, thanks for the link. It's an interesting article.

 

I take issue with the reporter's claim that the file actually says "Laurel". According to the spectrogram, what it says is what one particular (IMHO malformed) piece of software synthesized for that text... not the sound a human would (or even probably could, without re-engineering the vocal tract) create when speaking. 

 

But I'm glad Wired and the NYTimes (and late night comedians) are calling attention to actual speech science. Phonetics is, as far as I'm concerned, both a theoretical basis and a fast, efficient strategy for dialog editing.

 

I think what they are referencing is that the original source file was a computer pronouncing the word “Laurel”. So, I guess it is, technically, “Laurel.”

 

Cheers,

Evan

Link to comment
Share on other sites

9 hours ago, hobbiesodd said:

I think what they are referencing is that the original source file was a computer pronouncing the word “Laurel”.

 

That's precisely my point. The computer wasn't pronouncing "Laurel". It may have been asked to do that, but it was pronouncing a jumble containing parts of both words. 

 

 LauralLauralYanni.jpg

This spectrogram is, left to right, the computer's pronunciation (courtesy of NY Times, which I'm pretty sure they got from Twitter), then a human saying Laurel and Yanni. The human is me, at my desk just now, wearing an  E6 with no processing. The clips were pretty much normalized (as ref the fundamental ~ 100Hz).

 

Note how the computer's Laurel has stuff around 2k - 3k that isn't in the human version but is in the human Yanni. Note also some Yanni-like harmonics (formants) in the computer's Laurel ~ 500 Hz. 

 

I'd guess the student who discovered this actually found a bug in the speech synthesizer, but a fascinating one in terms of speech perception. Of course I can't guarantee it's a bug... it might have been a feature, or possibly a mistake in the dictionary software that was calling the synthesizer.


 

Link to comment
Share on other sites

UPDATE: I just downloaded the NPR story, and Jones is on a studio mic (either at NPR or via ISDN; it doesn't matter). So I'll process the clip and post.

I'm leaving my original request up for completeness.

 

 

Hi, all. This is a request for help...

 

NPR and some other media are reporting that the laurel/yanni voice is Jay Aubrey Jones, who was an out-of-work actor when Dictionary.com first set up, and took a gig saying thousands of words for them. (Interestingly, the NPR story says he didn't recognize his own voice, and had to be told by a Dictionary.com producer that it was him. Which possibly suggests he was replaced by a speech synthesizer, which would be a lot more efficient on their servers... but I can't blame the site for wanting to milk the story and create more buzz. It's been very good for them.)

 

In any event, NPR hasn't posted audio yet. Nor has anybody else. 

 

What I'm looking for is Jones being interviewed now, and inevitably asked to say "Laurel" for the mic. Then I can isolate it, normalize to my other samples, and compare his spectra with the meme version. It would be very interesting, particularly since I can't even imagine how an American English speaker would create the combination "Laur / Yann" syllable. Too many different simultaneous resonances. 

 

If you can find a clip, please post it here or email me with it. 

 

If the Jones interview is via POTs line, we should still be okay: a lot of the ambiguous energy is under 3.5k. I'll treat the meme to match.

 

If the Jones interview is via a modern cell, it might be dicey... most cells now use vocoding, and I don't know if the number of formant channels is limited. We'll see. I do know that modern cells go nuts if they try to carry two people saying different things simultaneously.

 

Anyway, if I can get the sample, I'll post the results. Maybe we can shut down another shoot like osa did. 😊

 

 

Edited by Jay Rose
update
Link to comment
Share on other sites

Okay. I got today's recording of Jones saying "Laurel" and "Yanni". In fact, he said "Laurel" twice... once as an announcer, and then again in conversation.

 

The results are what... ahem... I said earlier in this thread: It's not a faithful representation of Jones' (or anyone's) voice. It's a non-human distortion added either by Dictionary.com's compression, or by their speech synthesizer (which if they're using, they do seem to have trained with Jones' recordings).

The proof is in the spectrogram. First, the ambiguous "Laurel/Yanni" from Dictionary.com, as widely reported. Then Jones' studio interview on NPR, where he says "Laurel" as a narrator... then also says both words in conversation during the interview. 

 

laurelYanniDecoded.jpg

 

The ambiguous computer version has strong activity above 1.5k, which doesn't appear in either of the Jones "Laurels". There's a bit of activity up there in his "Yanni", which is normal... but nowhere near what the computer is doing.

The computer also has very little around 200 Hz, even though it's present in both Jones' "Laurels" and not in his "Yanni". 

Bottom line: the computer version has some sonic characteristics of both words, and is missing others. The brain is left with no alternative but to guess what the actual word is.

And anyone who claims that the Dictionary.com version is supposed to be Laurel is wrong... as demonstrated by the actor they claim recorded it.

 

Attached is the file I used to generate that spectrogram. Feel free to try your own experiments.

DictComJonesNPR.flac

Link to comment
Share on other sites

On 5/18/2018 at 4:33 PM, John Blankenship said:

More proof that reality's not what it's cracked up to be.

 

 

What if it's not an auditory effect at all, but actually a demo of an internet meme, and the sliding scale is highly manipulated behind the scenes and the word does actually change depending on how many times you've used it, and what direction you're coming from on the scale, in which case it's succeeded and really got us all going.....:)

Link to comment
Share on other sites

Well heck, if we're doing conspiracies:

 

What if Dictionary.com deliberately munged Jones' original recording through a vocoder, and told the high school student to try it, as a way to build buzz and brand recognition!

 

Or what if the CIA secretly replaced every computer audio app with dual-channel capability, and both are being broadcast. Then a randomizer in the client computer determines what mix will be sent to the speakers!

 

Or what if... it's... ALIENS!!!*

 

---

* That would explain the extra head resonances that don't exist in a human.

Link to comment
Share on other sites

21 minutes ago, Jay Rose said:

Well heck, if we're doing conspiracies:

 

What if Dictionary.com deliberately munged Jones' original recording through a vocoder, and told the high school student to try it, as a way to build buzz and brand recognition!

 

Or what if the CIA secretly replaced every computer audio app with dual-channel capability, and both are being broadcast. Then a randomizer in the client computer determines what mix will be sent to the speakers!

 

Or what if... it's... ALIENS!!!*

 

---

* That would explain the extra head resonances that don't exist in a human.

 

Yeh.....:)

Link to comment
Share on other sites

11 hours ago, Jay Rose said:

Well heck, if we're doing conspiracies:

 

 

Someone (but not me) suggested that it’s a covert gov’t mind control experiment— wherein the file has an embedded hidden track and they are using social media to test the efficacy of this potential auditory subliminal message delivery system...

 

Someone (DEFINITELY NOT ME) sure has some control/trust issues. 

 

Cheers,

Evan

Link to comment
Share on other sites

17 hours ago, hobbiesodd said:

...using social media to test the efficacy of this potential auditory subliminal message delivery system...

 

What a horribly inefficient way to do that, particularly since different people heard different versions of the 'carrier'. 

 

If someone wanted to bury subliminal messages, they could use the same masking algorithm that Nielsen uses to bury computer codes under songs on the radio, so their sample ratings listeners can wear devices that ignore the music and track the codes.

The code -- if you strip away the music -- sounds like a fax call. But nobody hears it that way. If it's turned up high enough that it actually competes with the music, people hear it as distortion on the song.

 

Link to comment
Share on other sites

16 hours ago, Jay Rose said:

 

What a horribly inefficient way to do that, particularly since different people heard different versions of the 'carrier'. 

 

 

 

Indeed! However, no one made any claims about the efficacy of government, Jay. This would be the same group of people that thought  MK-Ultra was a great idea.  

 

Cheers,

Evan

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...