Search This Blog

Monday, May 4, 2015

Voice Recog of My Dreams

Okay, my buddy Steve’s dream:
I had a weird, very vivid dream last night.  I was parked overlooking New Haven harbor watching as several landmark buildings were demolished.  I decided to call you on your cell to describe it to you.  (Ed. note: I was born in New Haven and Steve lives near there)

As the phone rang, I realized "Duh, this phone is just for texting.  Donna can't talk on the

But then you answered, saying "Hey Steve, what's going on?"  And I didn't know what to do, so I said out loud but to myself "What should I do?"  And you said "About what?"

I was all excited and said "You can hear me!!!"  You replied "No, but my phone automatically and instantly transcribes what you say."

And I thought that would be the coolest app possible!
Steve’s dream app actually exists BUT it’s got bugs. Sadly, voice recognition is still about as accurate as the auto correct feature which turns “vanilla” into “vaginal,” “Lauren” into “Laundry,” “indie” into “indecent,” etc.’ etc.

At one company where I worked, the trainer tried using a voice recog app while showing me the ropes of my new gig. NICE idea but, between buggy translation and the fact that my lipreading was faster (faulty yes but, surprisingly, more accurate) the app got binned. Also—we were using this one on one. It pretty much melted down in a full classroom or group meeting situation.

Inventor Jeffrey Bigham  from Carnegie Mellon has a solution.

From a 2013 Technology Review column:
Though voice recognition programs like Apple’s Siri and Nuance’s Dragon are quite good at hearing familiar voices and clearly dictated words, the technology still can’t reliably caption events that present new speakers, accents, phrases, and background noises. People are pretty good at understanding words in such situations, but most of us aren’t fast enough to transcribe the text in real time (that’s why professional stenographers can charge more than $100 an hour). So Bigham’s program Scribe augments fast computers with accurate humans in hopes of churning out captions and transcripts quickly.
Fab idea/scheme BUT it relies on ultra cheap-o, can’t-afford-to-feed-my-kids-or-pay-rent labor.
Those workers were paid a minimum of $6 an hour by Bigham’s team. The team also hired undergraduate work-study students for $10 an hour. The crowdsourced work of people in both groups appears to be only slightly less accurate than that of a professional stenographer, Bigham says.
The workers are paid six to ten bucks an hour? How McDonalds-ish.
I’m uncomfortable here. On the other hand, I can’t afford to have a stenographer or ‘terp at my side all the damn time either. I don’t know what the status of his invention is but, I think I’ll take a pass.

Transcence, a company in Berkeley has developed a cool smartphone app.

From a 2014 Medical column
The app uses real-time captioning on a phone to make group conversations between deaf and hearing possible
Transcense connects to multiple smartphones and leverages their microphones to listen and interpret the conversation right onto the participant's screen. A speech recognition algorithm is used to detect each individual’s voice and link them with a color. This makes it easy for the user to see who is talking just by looking at their smartphone screen.
Sounds mega awesome, right? Once again though, it’s using that faulty speech recog software.
There’s a great little column over at Slate (from 2014) that wittily mentions some of the probs and gets, mebbe, at the root, the possible why-this-isn’t-solved-yet bit.
Star Trek posited a Universal Translator, crowdsourced by developers from all around the galaxy, who designed the complex artificial intelligence and machine learning algorithms necessary to immediately listen, infer, and respond to thousands of indigenous languages. The Universal Translator didn’t merely translate human to alien language in real time: It also acted as an interface between the humans and the computers they used. 
Unfortunately, we have no single federation of developers and linguists contributing to a gigantic matrix of standard human-machine language. The people working on this can’t even decide on an acronym.
We live in an age of miracles and wonder—can a truly useful, brill, relatively bugless solution be so far off?

No comments:

Post a Comment