News & Analysis

Voice Bots Have One Big Problem: Human Behavior

March 23, 2017March 27, 2017 by Rick Robinson

Share this:

I love you. What are the best shampoos to fight hair loss? When does the next season of Girls begin? Find local spas with a male masseur. What is the average weight of a 5’7” woman? Please make a dinner reservation for our anniversary. Find the nearest divorce lawyer. My boss is an imbecile. I hate you.

Imagine saying these things aloud in an office setting, or in your home with others around. (Or even while alone in the room… but can they hear you from the kitchen? Are you sure?)

Put simply, you wouldn’t.

Much of what we have few qualms about typing into a search bar simply doesn’t pass the personal privacy test when turned from bits to sound waves.

We cover our mouths when talking on a cellphone to obscure our conversations, no matter how banal — perhaps the one upside of the old flip-phones is they’d mask a bit of our conversation and obscure us from nosy amateur lip-readers.

So naturally technology comes to the rescue to save us from, well, technology. Look at Hushme for instance. While the “world’s first voice mask for mobile phones” unfortunately turns you into a borg getting teeth-whitening treatment, it apparently does offer the privacy we seek without having to whisper, talking with lips tightly pursed or cupping our cellphones. But come on, if we’re not going to wear a tiny camera on glasses from Google do this and other maskers have a chance?

Now consider this against all the pundits talking up the new voice interfaces (sometimes called “zero interfaces”). These are not text-driven chatbots per se (which I still believe have a bright future especially in support and commerce) but their verbal cousins — early among them are Alexa from Amazon, Home from Google and Siri from Apple.

Surely you’ve played with at least one of these. In my experience most people start strong but end, days later, just asking for the weather forecast or to hear a song (“I’m sorry, that song was not found.”) Sigh.

And these conversational voice interfaces will surely get smarter, faster, more natural; perhaps lulling us into believing we’ve found a new friend (see: Her), or just a personal assistant that makes plans easier to book and asks for nothing in return, save for more requests. Couple this with advancements in wireless, in-ear headphones and suddenly one can imagine a future of greater convenience and efficiency.

I actually look forward to all this, but maintain that the companies mentioned above, along with others banking at least a portion of their future on the tech’s success, such as Microsoft, have a problem: human behavior.

Unlike texting with a chatbot by typing requests and comments in a box and seeing a response instantly — something we’ve been conditioned to do for years thanks to AIM, SMS, etc. — voice interfaces ask us to simply say things into the air. Certainly there’s a huge convenience factor over tap-tap-tapping letters. But the “I love you” dilemma remains: You’re not likely to say it aloud just so a machine can transfer the message.

A similar phenomenon played out when instant messaging clients (AIM for one) added video capabilities. Many thought the age of The Jetsons had finally arrived and that people would give up typing for talking. Nope. The social buffer that text interfaces provide allow even the shyest among us to write things we’d never dream of saying. This same sense of security, not available with voice interfaces, will push hard against those coming, talky assistants.

This is not to say IoT won’t benefit tremendously from voice interfaces: “Raise the temperature to 72” … “Order more milk” … “Is the security alarm on?” … “Hi lightbulb, how much are you costing me a day?”

All these verbal commands will flourish, making our offices and households easier and more fun to manipulate. But commanding a machine to do something tactical, amplifying its utility, is a world away from saying into the air anything approaching the personal. Don’t believe me? Look up from your device right now and say aloud “what does a herpes rash feel like?” You get the point.

Again, I think we’re surely moving in the direction of voice input to bots, but until microphones advance — allowing you to request things with a near-silent whisper (or with thoughts) people will continue opting to let their fingers do the talking.

Predicting human behavior is pretty simple. Changing human behavior is a bitch.

RickR Rick Robinson is SVP of Product for on-demand roadside assistance startup Urgent.ly. He is also an advisor to Street Fight. Follow him at @itsrickrobinson