Jay Rose Posted May 17, 2018 Report Share Posted May 17, 2018 New York Times has an excellent technical explanation of this week's acoustic 'blue dress / gold dress' meme, with an interactive tool that lets you simulate different listening/auditory conditions. https://nyti.ms/2L4DQJO The original issue is, I believe, due to a badly implemented speech synthesizer. But there's a lesson in there for all of us, about the importance of accurate recording / equalization / monitoring. Quote Link to comment Share on other sites More sharing options...
osa Posted May 17, 2018 Report Share Posted May 17, 2018 Jay your post couldnt be more perfectly timed. Our green screen shoot came to a screeching halt when this subject came up followed by "of course the sound guy would find this". Clients and crew equally entertained seeing where they fall on the scale. A pleasant distraction! -Ken Quote Link to comment Share on other sites More sharing options...
AnuarYahya Posted May 17, 2018 Report Share Posted May 17, 2018 amazing, thanks for sharing! Quote Link to comment Share on other sites More sharing options...
LarryF Posted May 17, 2018 Report Share Posted May 17, 2018 Wild. Thanks, Jay Quote Link to comment Share on other sites More sharing options...
Mirror Posted May 17, 2018 Report Share Posted May 17, 2018 Thanks, Jay Quote Link to comment Share on other sites More sharing options...
Dalton Patterson Posted May 17, 2018 Report Share Posted May 17, 2018 Psychoacoustics are real. Quote Link to comment Share on other sites More sharing options...
Jay Rose Posted May 17, 2018 Author Report Share Posted May 17, 2018 Dalton, it sure is. Unlike Shepard Tones and MaxxBass and stuff like that, though, this seems to be an issue of language perception... and how unexpected combinations of formants can drive the mechanism crazy, The spectrograms are particularly revealing. Thought experiment: Today's audio tech relies on masking for things ranging from mp3 to Nielsen radio ratings to Dolby Digital. But is it a psychoacoustic phenomenon or an chemical/time characteristic of the mechanism? Does it happen in the mind, or in the impulses that get to the mind? Quote Link to comment Share on other sites More sharing options...
Dalton Patterson Posted May 17, 2018 Report Share Posted May 17, 2018 🤔 🤯 Quote Link to comment Share on other sites More sharing options...
hobbiesodd Posted May 18, 2018 Report Share Posted May 18, 2018 15 hours ago, Jay Rose said: Thought experiment: Today's audio tech relies on masking for things ranging from mp3 to Nielsen radio ratings to Dolby Digital. But is it a psychoacoustic phenomenon or an chemical/time characteristic of the mechanism? Does it happen in the mind, or in the impulses that get to the mind? Jay, I read an interesting article that touches on this and the biological voodoo mechanics behind who hears what and why..... https://apple.news/ASs4UNdVSRjCTSDwsYHoPww Cheers, Evan Meszaros Quote Link to comment Share on other sites More sharing options...
Jay Rose Posted May 18, 2018 Author Report Share Posted May 18, 2018 Hobbiesodd, thanks for the link. It's an interesting article. I take issue with the reporter's claim that the file actually says "Laurel". According to the spectrogram, what it says is what one particular (IMHO malformed) piece of software synthesized for that text... not the sound a human would (or even probably could, without re-engineering the vocal tract) create when speaking. But I'm glad Wired and the NYTimes (and late night comedians) are calling attention to actual speech science. Phonetics is, as far as I'm concerned, both a theoretical basis and a fast, efficient strategy for dialog editing. Quote Link to comment Share on other sites More sharing options...
John Blankenship Posted May 18, 2018 Report Share Posted May 18, 2018 More proof that reality's not what it's cracked up to be. Quote Link to comment Share on other sites More sharing options...
hobbiesodd Posted May 18, 2018 Report Share Posted May 18, 2018 50 minutes ago, Jay Rose said: Hobbiesodd, thanks for the link. It's an interesting article. I take issue with the reporter's claim that the file actually says "Laurel". According to the spectrogram, what it says is what one particular (IMHO malformed) piece of software synthesized for that text... not the sound a human would (or even probably could, without re-engineering the vocal tract) create when speaking. But I'm glad Wired and the NYTimes (and late night comedians) are calling attention to actual speech science. Phonetics is, as far as I'm concerned, both a theoretical basis and a fast, efficient strategy for dialog editing. I think what they are referencing is that the original source file was a computer pronouncing the word “Laurel”. So, I guess it is, technically, “Laurel.” Cheers, Evan Quote Link to comment Share on other sites More sharing options...
Jay Rose Posted May 19, 2018 Author Report Share Posted May 19, 2018 9 hours ago, hobbiesodd said: I think what they are referencing is that the original source file was a computer pronouncing the word “Laurel”. That's precisely my point. The computer wasn't pronouncing "Laurel". It may have been asked to do that, but it was pronouncing a jumble containing parts of both words. This spectrogram is, left to right, the computer's pronunciation (courtesy of NY Times, which I'm pretty sure they got from Twitter), then a human saying Laurel and Yanni. The human is me, at my desk just now, wearing an E6 with no processing. The clips were pretty much normalized (as ref the fundamental ~ 100Hz). Note how the computer's Laurel has stuff around 2k - 3k that isn't in the human version but is in the human Yanni. Note also some Yanni-like harmonics (formants) in the computer's Laurel ~ 500 Hz. I'd guess the student who discovered this actually found a bug in the speech synthesizer, but a fascinating one in terms of speech perception. Of course I can't guarantee it's a bug... it might have been a feature, or possibly a mistake in the dictionary software that was calling the synthesizer. Quote Link to comment Share on other sites More sharing options...
Jay Rose Posted May 19, 2018 Author Report Share Posted May 19, 2018 For completeness, here's the audio file I used to generate that spectrogram. You can open it in Audacity if you don't have any other flac software. (And you can generate a spectrogram in Audacity as well, though I used BlueCat.) LauralLaural.flac Quote Link to comment Share on other sites More sharing options...
Mirror Posted May 19, 2018 Report Share Posted May 19, 2018 Quote Link to comment Share on other sites More sharing options...
Jay Rose Posted May 20, 2018 Author Report Share Posted May 20, 2018 Obviously, Twitter's Laurel doesn't like a certain new-age musician... Quote Link to comment Share on other sites More sharing options...
Jay Rose Posted May 20, 2018 Author Report Share Posted May 20, 2018 (edited) UPDATE: I just downloaded the NPR story, and Jones is on a studio mic (either at NPR or via ISDN; it doesn't matter). So I'll process the clip and post. I'm leaving my original request up for completeness. Hi, all. This is a request for help... NPR and some other media are reporting that the laurel/yanni voice is Jay Aubrey Jones, who was an out-of-work actor when Dictionary.com first set up, and took a gig saying thousands of words for them. (Interestingly, the NPR story says he didn't recognize his own voice, and had to be told by a Dictionary.com producer that it was him. Which possibly suggests he was replaced by a speech synthesizer, which would be a lot more efficient on their servers... but I can't blame the site for wanting to milk the story and create more buzz. It's been very good for them.) In any event, NPR hasn't posted audio yet. Nor has anybody else. What I'm looking for is Jones being interviewed now, and inevitably asked to say "Laurel" for the mic. Then I can isolate it, normalize to my other samples, and compare his spectra with the meme version. It would be very interesting, particularly since I can't even imagine how an American English speaker would create the combination "Laur / Yann" syllable. Too many different simultaneous resonances. If you can find a clip, please post it here or email me with it. If the Jones interview is via POTs line, we should still be okay: a lot of the ambiguous energy is under 3.5k. I'll treat the meme to match. If the Jones interview is via a modern cell, it might be dicey... most cells now use vocoding, and I don't know if the number of formant channels is limited. We'll see. I do know that modern cells go nuts if they try to carry two people saying different things simultaneously. Anyway, if I can get the sample, I'll post the results. Maybe we can shut down another shoot like osa did. 😊 Edited May 20, 2018 by Jay Rose update Quote Link to comment Share on other sites More sharing options...
Jay Rose Posted May 20, 2018 Author Report Share Posted May 20, 2018 Okay. I got today's recording of Jones saying "Laurel" and "Yanni". In fact, he said "Laurel" twice... once as an announcer, and then again in conversation. The results are what... ahem... I said earlier in this thread: It's not a faithful representation of Jones' (or anyone's) voice. It's a non-human distortion added either by Dictionary.com's compression, or by their speech synthesizer (which if they're using, they do seem to have trained with Jones' recordings). The proof is in the spectrogram. First, the ambiguous "Laurel/Yanni" from Dictionary.com, as widely reported. Then Jones' studio interview on NPR, where he says "Laurel" as a narrator... then also says both words in conversation during the interview. The ambiguous computer version has strong activity above 1.5k, which doesn't appear in either of the Jones "Laurels". There's a bit of activity up there in his "Yanni", which is normal... but nowhere near what the computer is doing. The computer also has very little around 200 Hz, even though it's present in both Jones' "Laurels" and not in his "Yanni". Bottom line: the computer version has some sonic characteristics of both words, and is missing others. The brain is left with no alternative but to guess what the actual word is. And anyone who claims that the Dictionary.com version is supposed to be Laurel is wrong... as demonstrated by the actor they claim recorded it. Attached is the file I used to generate that spectrogram. Feel free to try your own experiments. DictComJonesNPR.flac Quote Link to comment Share on other sites More sharing options...
pindrop Posted May 21, 2018 Report Share Posted May 21, 2018 On 5/18/2018 at 4:33 PM, John Blankenship said: More proof that reality's not what it's cracked up to be. What if it's not an auditory effect at all, but actually a demo of an internet meme, and the sliding scale is highly manipulated behind the scenes and the word does actually change depending on how many times you've used it, and what direction you're coming from on the scale, in which case it's succeeded and really got us all going.....:) Quote Link to comment Share on other sites More sharing options...
Jay Rose Posted May 21, 2018 Author Report Share Posted May 21, 2018 Well heck, if we're doing conspiracies: What if Dictionary.com deliberately munged Jones' original recording through a vocoder, and told the high school student to try it, as a way to build buzz and brand recognition! Or what if the CIA secretly replaced every computer audio app with dual-channel capability, and both are being broadcast. Then a randomizer in the client computer determines what mix will be sent to the speakers! Or what if... it's... ALIENS!!!* --- * That would explain the extra head resonances that don't exist in a human. Quote Link to comment Share on other sites More sharing options...
pindrop Posted May 21, 2018 Report Share Posted May 21, 2018 21 minutes ago, Jay Rose said: Well heck, if we're doing conspiracies: What if Dictionary.com deliberately munged Jones' original recording through a vocoder, and told the high school student to try it, as a way to build buzz and brand recognition! Or what if the CIA secretly replaced every computer audio app with dual-channel capability, and both are being broadcast. Then a randomizer in the client computer determines what mix will be sent to the speakers! Or what if... it's... ALIENS!!!* --- * That would explain the extra head resonances that don't exist in a human. Yeh.....:) Quote Link to comment Share on other sites More sharing options...
hobbiesodd Posted May 22, 2018 Report Share Posted May 22, 2018 11 hours ago, Jay Rose said: Well heck, if we're doing conspiracies: Someone (but not me) suggested that it’s a covert gov’t mind control experiment— wherein the file has an embedded hidden track and they are using social media to test the efficacy of this potential auditory subliminal message delivery system... Someone (DEFINITELY NOT ME) sure has some control/trust issues. Cheers, Evan Quote Link to comment Share on other sites More sharing options...
Jay Rose Posted May 22, 2018 Author Report Share Posted May 22, 2018 17 hours ago, hobbiesodd said: ...using social media to test the efficacy of this potential auditory subliminal message delivery system... What a horribly inefficient way to do that, particularly since different people heard different versions of the 'carrier'. If someone wanted to bury subliminal messages, they could use the same masking algorithm that Nielsen uses to bury computer codes under songs on the radio, so their sample ratings listeners can wear devices that ignore the music and track the codes. The code -- if you strip away the music -- sounds like a fax call. But nobody hears it that way. If it's turned up high enough that it actually competes with the music, people hear it as distortion on the song. Quote Link to comment Share on other sites More sharing options...
hobbiesodd Posted May 23, 2018 Report Share Posted May 23, 2018 16 hours ago, Jay Rose said: What a horribly inefficient way to do that, particularly since different people heard different versions of the 'carrier'. Indeed! However, no one made any claims about the efficacy of government, Jay. This would be the same group of people that thought MK-Ultra was a great idea. Cheers, Evan Quote Link to comment Share on other sites More sharing options...
Jay Rose Posted May 23, 2018 Author Report Share Posted May 23, 2018 Oh, MKUltra ended in the 1970s. We've gone considerably downhill since then... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.