Conversational AI should be in every developer toolbox. Here's why.

Hi folks. I’ve posted a few conversational AI tutorial articles here. I realize that I failed to talk about one important thing - why you, as a developer, should even think about adding conversational technologies to your stack of tools.

Fair warning: I work for a startup with the world’s most advanced conversational AI APIs and a studio built specifically for developers. Because of this, the examples of use cases and approaches below will be based on the art of the possible with Dasha. I will also talk to you about Google Dialogflow and Rasa AI which offer great sets of tools for some use cases.

Why think about conversational voice technologies in the first place

Voice is the native interface for human-human communication. And that’s it. That’s your reason for why you should think about conversational AI technologies. The tactile way in which we communicate with machines today is a stepping stone.

Look at science fiction to show you the way (I have been doing this since I learned to read and it hasn’t steered me wrong). HAL 5000 in 2001: A Space Odyssey, Eddie the ship's computer in HHGTTG, Jarvis in Iron Man. These are all machines that communicate with humans at a level indistinguishable from a human being. In this communication commands are born not in the simple command-response ways of today’s Alexa.

These machines parse deep intent from the words of the human that is conversing with them, from their intonations, emotional tone and from the wider context of the conversation. Which is the same thing that humans do. Which makes the human feel comfortable in conversing with the machines. Which, in other words, passes the Turing test.

In order for voice user interfaces to become ubiquitous, we need conversational AI that communicates at a level indistinguishable from a human, without falling into an uncanny valley trap.

Where is conversational voice AI today

What is done with conversational AI today:

Chatbots. Mainly simple, often multiple-choice apps. Great for taking an order or routing a customer to the right support agent. I felt I had to mention, though the focus of this article is on voice.
Basic voice user interfaces - Alexa Skills, Siri assistant and such. They are great for fulfilling a single function, for example, tell you the weather or play a song. As these are command-response interactions, they do not aim to pass the Turing test and they are limited in scope of application.
Voice AI business automation. Call center automation from simple voice (not touch tone) menus to more complex outbound applications. Some of these applications (mainly outbound today) do aim to pass the Turing test and oftentimes do pass it.

We’re not yet at the point where conversational AI can fully replace tactile interfaces. Yet we are getting there. Here is me giving a demo of a conversational app which I recently built:

This is a fairly simple demo. It only took me a few of hours to build from scratch. Using the same technology already today some Dasha users are building conversations with hundreds of logical nodes. In live call center environments, some apps are performing better than their human agent counterparts were before them.

If you want to try building an app like the one in point 3, just look at my post history for some tutorials or pop in to our conversational AI dev community. You'll get an API key and instructions automatically upon joining. @ me in the intro channel, I’ll help where I can.

What will adding conversational AI to your toolbox give you, as a developer?

It will give you two things. The ability to build voice interfaces for your apps and the ability to run automated telephone conversations with an API call.

Voice user interfaces

With a deep voice interface you can let your users engage every interaction in your app with their voice.

A few use cases for this:

If your app is designed to be used on the go (walking, running, driving), for example, it’s a navigation app, a music app, etc.
If there is a use case in which your app runs in the background, while the user switches to another app.

I think in the future, VUIs will be standard fare in most new software products.

In the meantime, you can implement a voice user interface with Dasha using a connector to our Node.js SDK. Next year, we will roll out Swift and Kotlin SDKs.

Automated telephone conversations

Kind of like Twilio giving developers the ability to use SMS text messages with the call of an API, a proper conversational AI API gives you the ability to conduct automated calls with the call of an API.

Here are just a few use cases you might want to use this for:

Call a user who abandoned their shopping cart and ask if they need any help completing the purchase or have some questions they need answered.
Call a user to verify some details or call them up as a part of an onboarding process.
Set up a fully automated customer service line for your product. You can take calls through telephony or in-app using voice over GRPC.
Call back a newly generated lead in less than a minute to convert them into a user.
Build a voice Discord bot that can do pretty much anything you can think of.
Literally build a replacement for a call center agent and impress your boss at the big bank.

What skills do you need to build with conversational AI?

Analytical

Analytical skills are important for two reasons. One - you need to logically structure your conversation before you begin developing it. Two - you will need to analyze live conversations that your AI app has with real human users and implement changes to improve the app. Rasa AI calls this second part Conversation Driven Development. I love this term. We at Dasha call it training the application.

Coding

Google Dialogflow

Interestingly enough, with Google Dialogflow, the analytical skill is the only one you will need to create conversations - it is an in-browser GUI for creating automated dialogues. You can provide intents training data, create scripts, STT and TTS comes out of the box. Two software engineers who tested all three platforms felt that their learning curve was higher with Dialogflow even though it was a no-code environment, than with Rasa or Dasha. Pros: no-code platform. Biggest cons: speech synthesis sounds very robotic, hard to make dialogue paths handle digressions. You may need a bit of technical knowledge to set up integrations through web hooks.

Rasa AI

To use Rasa, you will need to have Python installed to use Rasa Open Source. Knowledge of Python will be needed as you build. You can specify intents, as a part of training data. As mentioned earlier, Rasa is a technology for chatbots. If you want to use it with voice, you will need to connect external text to speech and speech to text. Connecting to external services obviously adds delays which detracts from the conversational user experience.

There are multiple dialogue policies which can be used simultaneously and which dictate conversation flow (I found this a bit confusing). Rasa has a great low-code/no-code GUI for conversational design. You describe dialogue using stories and intents using NLU data. You can also specify bot responses, forms (collecting data) and rules. The open source Rasa server runs on your machine and connects to your app.

Its main pro is that you can quickly build simple, straightforward text conversations. Its main drawback is that it is a bot builder. It does not aspire to let you create conversations which are human-like in form and content.

Dasha AI

To use Dasha, you will need to have Node.js installed. You will also want to know JavaScript. DashaScript is a domain-specific language which you use to script the dialogue between the machine and the user.

You specify intents training data, much as you do with Rasa, but with Dasha you can also define named entities which can be used for slot filling. You define AI responses in the phrase map or right within the body of your dialogue script. You can define digressions (send conversation to a specific node at any point based on what the user has said), to emulate how humans do dialogue. Human-like speech synthesis and text to speech come out of the box. Dasha SDK runs within your Node.js app, the conversation gets executed in the Dasha Cloud.

Dasha’s main pro is that you can build simple human-like conversations quickly or take longer to build complex conversations of nearly unlimited depth. Its main drawback is that we are still in Beta and polishing up some details with user feedback. All the more reason to join our community and be one of the folks giving their input to this AI as a service engine. Don’t get it twisted though, we have paying customers using the product even in its Beta. Dasha AI is processing close to 10 million conversations monthly.

In closing

To summarize, you’ll be able to add conversational AI to your stack of developer tools even if you have only a few months of programming experience. I’m a case in point - I started building with Dasha, then started teaching myself JavaScript with FreeCodeCamp. I have used Rasa Open Source and had no problem with it even though my Python skills are almost non-existent.

You should hold some of these tools in your stack because they let you engage with users in ways that differentiate the user experience.

Have you used conversational AI dev tools before?