Commentary

6 Guidelines for the Path to Voice Everywhere

March 5, 2018March 5, 2018 by Kal Baumwart

Share this:

I recently complained that we need friendly AI in the model of the computer on Star Trek. This friendly AI underlies the concept of “voice everywhere,” which is driving product and engineering teams in technology across the world.

Street Fight is well educated on the topic of voice everywhere, and we frequently see articles and posts about how voice is changing marketing, searching, branding and advertising. We seem to agree that, whatever our desires and opinions of voice’s application to everything, voice interaction is the best possible future of search and plays an intrinsic role in our interactions with local content.

Amazon is leading the way, working aggressively toward a future where the AI in the background is friendly and can pay for herself by intelligently marketing products.

Alexa has skills. So do Siri, Cortana, and Google Assistant. Separately, they’re pretty cool. Together, they’d be game changing.

To bring the different agents together, we need a driving force for adoption that’s third-party. The Internet of Things (IoT) is the best example of a nascent ecosystem that will gain from increased voice control. Since any individual “T” in the IoT family will have little processing power, it’s up to the network to provide seamless, rules-based treatment of semantic control.

With that in mind, I want voice (control of everything), everywhere, but I have strict requirements for how it should be designed, engineered, and implemented. Here are six requirements for a reasonable deployment of voice everywhere.

1. Learn by conversation. In most cases where the assistant I’m trying to use is unaware of what I want to do, the assistant points me to some external computer device to add a skill, change a setting, or modify my query. Don’t make me do that. Instead, train the assistant to understand what I want to do with a question-and-answer exercise driven by the assistant. If I decide the Q&A is too much work, then I will open an external app.

2. Be aware of your surroundings. I know you’re listening, Assistant. I know you can ping all the devices on the network to which you’re attached. So, then, if you see another assistant, whether of the same make or a different one, save that information and ask me if you can connect and share details. If I say “OK,” there should be no difference in experience between one tiny robot and another. Don’t tell me in the bedroom that you don’t know who the Arctic Monkeys are if you just played them in the den.

3. Don’t be too aware. I know we’re getting a little close to discussing AI, but indulge me. I don’t mind that you are always listening for your name. I don’t even care that much when you light up because you think, incorrectly, that you heard your name mentioned. But I want a high level of comfort that you aren’t spying on me when I’m not expecting it. Maybe I should have control over the conversation buffer? Or at the very least I should be able to replay the buffer even if it’s already been sent to the NSA.

4. The experience from kitchen to den to car should be the same experience. This is on us, the product developers and engineers, and not on the digital assistants themselves. So, let’s get together on some key terms, rules, and interface specifications so we can deliver a consumer-friendly experience. This means no walled gardens (I’m looking at you, Apple) and some consideration of safety and security (Linux, you paying attention?). Like the requirement in #3 above: maintain a certain distance from me and my family. I’m hoping for the Enterprise Computer here, not an overzealous HAL 9000 with invisible goals.

5. We built the Internet with tons of different hardware. We connect with wires, fiber, radio frequencies, and probably one day by Ansible. And we allow all kinds of software to coexist in this global melting pot, all bound with some guidelines by interconnections, presentation specifications, and standards. The work we do now to engage each other and document the parameters of acceptable use will drive tomorrow’s interoperability. Ultimately, just as in the case with the Internet, such standards will become aids for adding solutions to the voice everywhere control protocol.

6. In places like the grocery store, let’s combine #2 and #3 into a new demand. Sell me stuff. I know you have to advertise to remain inexpensive. But hey, keep it light. What I want to avoid here is an in-your-face kind of marketing that future dystopia movies like Minority Report and Blade Runner seem to be training us to expect. I’ll try New! Improved! Oreos with a simple whisper message — don’t jump out in front of my cart with some holographic, 3D, anthropomorphic Oreo monster created from my private habits and in-store location. Just don’t do that. You’ll end up creating demand for 911 services and adult diapers more than for the tasty snacks you want to move.

The six requirements above just scratch the surface of what’s needed to create a robust voice ecosystem. We’re watching university research, product launches, and ethics conferences to ensure that our product additions to the ecosystem are future-aware. I recommend that companies developing voice-controlled assistants or relying on the semantic structures underlying voice controls of consumer experiences establish a strategic vision for voice. Let’s discuss this. If there is a standards body already defining it, let’s join up and make the debate lively!