Saturday, October 8, 2011

“Hear no evil, speak no evil”—or was that “herniaville, spacey weevil?”

While teaching a class on writing family histories recently, I admitted that I know little about voice-recognition software.  Two days later, my husband needed to quickly transcribe many pages of family history—we never typed it up because it had been damaged in the Teton Dam Flood—and we decided it would be easier to read it out loud than to type it.

We acquired “Dragon Dictation,” a voice recognition program for our computer.  Other programs include IBM's ViaVoice, e speaking Software, and Voice Studio—which is somewhat cheaper and not as smooth, according to reviews, but with a fun “create your own skit” feature.

I anticipated that besides taking care of this job, I could sit Cousin Ted down next to the computer, get him rambling about his cowboy days, and have a print-ready family history story with little to no effort. It turns out, I would have to get Ted and the software well acquainted before going far.

It takes awhile to “teach” a computer to recognize a voice and its nuances. My husband spent hours doing this, and now, like the RCA dog, our “dragon” knows its master’s voice.  I’m working to teach it mine.

Learning to understand what people say is one of the most complex tasks faced by human beings—a task that starts early in life. 

We hold a baby and murmur, and he or she can tell from our vocal tones whether we are happy or sad, relaxed or stressed.  Over time, babies learn the whole range of language—phonology, or the way words sound; tense, number, gender, and so on.  Much of what we think is “cute” in the language of toddlers is just their way of figuring all this out. 

For instance, when a two-year-old says, “My do it!” we laugh and say, “You mean, ‘Let me do it!’ ” Soon the little one has all the words in the right order—whether he or she speaks English, Chinese, Greek or Russian.

In my college linguistics class, our professor had no mercy. He’d cover the board with foreign-language verbs to conjugate or nouns to change in gender or number, and when we complained, he’d say, “What’s wrong with you? Any two-year-old in Peru (or Finland, or Sumatra) can figure this out!”

That’s the job faced by a voice recognition program, with the added problem that it can’t tell voice from background noise.  We spend a lifetime chatting over the sounds of lawnmowers, television, other people, car engines and the like.  This unfortunate software has to learn what your voice sounds like AND what sounds aren’t your voice.

But once it learns, what a blessing! Most of us speak approximately 140 words a minute, but can only type about 40 words per minute. My husband churned out pages of material in record time, inserting words like “period” and “new paragraph” so the document was punctuated.

And just like it’s fun teaching a toddler to talk, it’s fun teaching a program to recognize your voice.  I talked about a date I had with my husband.  Dragon Dictation translated, “He took me to ‘The Nutcracker’ ” as “He kicked me to the neck cracker.”

Then I tried to dictate a recipe for doughnuts.  Poor software—to never have sunk its teeth into one of my Dad’s soft, tasty, cinnamon-y (it thinks cinnamon is “sentiment”) doughnuts. It rendered the word doughnut as: “go check, go not, donor, Don Knotts” and my favorite, “go nuts.”

When it got it right, my husband said, “Thank you, you’re a genius.”  It rendered it, “Thank you, Eurydice.”

It thinks its master is so smart! 

1 comment:

  1. So that's how you pass the time with no kids at home! It actually sounds like fun-- Dad has always been interested in that kind of software. I remember when he first got the computer to read what we wrote, and I think we spent hours and hours making the computer say ridiculous things in various voices. Now the tables are turned!