Episode Summary: What makes a chatbot or a conversational interface actually work? What kind of work does one need to do to get a chatbot to do what one wants it to do? These are pivotal questions and questions that for most business leaders are still somewhat mysterious, but that’s exactly what we’re aiming to answer on this episode of the AI in Industry Podcast.
This week we speak with Madhu Mathihalli, CTO and co-founder of Passage AI. We speak specifically about what kinds of tasks conversational interfaces are best at, what kinds of word tracks, what kind of questions and answer are they suited for and which are a bit beyond their grasp right now. In addition, we speak about what it takes to train these machines. In other words, how do we define the particular word tracks that we want to be able to automate and determine which of them might be lower hanging fruit for applying a chatbot or which of them might not?
Subscribe to our AI in Industry Podcast with your favorite podcast service:
Expertise: customer acquisition, SEO, product development, product management, strategic planning
Brief Recognition: Mathihalli previously served as Senior Director for Walmart eCommerce and was Director of Engineering at Kosmix. Earlier in his career, he was the project lead for testing and integrating the HP-UX operating system at Hewlett Packard Co.
(02:30) When you guys think about where you see the deepest grooves of a real traction in business for chatbots and conversational interfaces, how would you describe those to a business audience?
Madhu Mathihalli: Customer service is one of those areas where all of us have experienced long wait time, especially when we call up 1800 numbers, and more often than not the vast majority of these questions would be classified as the mundane questions. Hey, my internet is not working. Or where is my order? Now, these are questions that don’t necessarily need a human in the loop and these are things that the system can try to automate and answer some of this stuff. And these are areas we’re certainly seeing quite a bit of traction and a second similar area is in the frequently asked questions or the knowledge spaces, where again a lot of people are coming in and asking, what is your return policy, my coupon expired yesterday, can I still use it today.
These are the questions again. The information’s already available in some sort of a frequently asked question or a knowledge base and the customers are just not willing to go look it up.
Believe it or not, something like 40% of all the questions are coming to a customer service agent are these sort of questions, and the agents are taking their time to answer the questions while the remaining 60 are actually waiting on the line with the real important question.
Now, what if we can automate all of this 40 % of the questions? If we are able to identify the question easily, and these are straightforward questions. Where is my order should not actually require a human in the loop, I should be able to get your order ID or an email and go look up FedEx and answer the question and tell you exactly when it will arrive, and it’s not too farfetched to think about it.
So, if we can just eliminate that 40% of all the call volumes, it’s a huge win not just for the business but also for the customer because the 40% of the customers are no longer holding up the customer service agent’s time, and they are giving the question answer easily and the agent is also happy because now they actually have that extra time to service the remaining customers with a real value and real deep questions.
(6:00): Why it is that these 2 high volume areas end up being, let’s say, lower hanging fruit for a chatbot than another more niche or complicated question?
MM: In some of these questions, the customers are always thinking their order is slightly different. The way … Maybe you came in and asked, “Hey, I ordered for a black pants, but I got the blue ones instead. Can you help me with that?” Now, these are things … Some of the elements or the entities we should be able to extract some of them and go build specific logic, simple if-then conditions. Very trivial ones or decision trees and these are areas that commonly define workflows for the majority of these cases.
So, the fact that the user actually did not receive all their order, it’s very straightforward to identify that from the user message and put the used into the particular decision tree or workflow and then go take it from there. Now, the huge advantage that the bots already have is the historical purchase or the history of the customer interactions. The customer already knows where … The bot already knows what the customer has done and they can start to look at things personalized to that particular customer and actually build out their decision tree.
And then again, the vast majority of these workflows are not specific to a particular customer so we don’t need to personalize a lot of the stuff but instead just follow the particular workflow and enable the right answer to the particular customer.
For example, let’s take food service. Ordering a burger. The way you order a burger may be different than the way I order the same burger. You may say, can you order a burger with cheese, I can say can you order a burger without cheese. The workflow itself, there are all these limited sort of attributes or limited sort of toppings that the burger can have. It’s not too many. Now, the customization is not that complicated here.
There’s so many burgers and so many toppings that can go on a burger. Yes, granted the way you order it, the way you render the same sentences, the same message can be multiple different ways but at the end of the day the toppings that go on the burger, the quantity of the burger, whether you want along with that whether you want a Pepsi or a Coke, it doesn’t change.
And we don’t have to be very, very customized to that particular customer’s order, like personalized. We can still understand the natural language, understand the dependencies. Whether you say with cheese, without cheese, cheese on the side. These are all just different ways of rendering the same thing. We just need to identify using natural language, how do I understand what the user is asking, comprehend the request, understand the intent and map it to a workflow. And we should be able to handle a lot of these sorts of scenarios.
(10:00) What are some of the newer capabilities in maybe the coming few years that you think will become increasingly maybe possible within this domain of conversational interface that maybe isn’t really all that much in use today?
MM: One of the things that we’re actually super excited and we’re working towards is this concept of dialogs, right? A dialogue simply put is multi-state conversation, right? So, in case of a question and answer it’s very straightforward. I can ask you, “Hey where is my order?” And you can tell me where is my order and tell us about that. Those sort of work we can call this dialog state version 1.
Now, the minute you start doing multi state conversation, for example, I can ask you, “Do you have a preferred restaurant?” You may not necessarily have a preferred restaurant but you may have a preferred cuisine in a particular location. You can come up and say, “Hey, I’m looking for something Asian in San Francisco and I’m looking for a table for 2 between 7 and 8 pm.”
Now, what if we do not have that restaurant available? What if we don’t have the particular requirement satisfied? You and the bot should be able to negotiate and land on a particular restaurant of your choice that’s available.
Now, as you start working towards enabling something like this, it starts getting very complicated, because it’s a combination of what your preferences are, what is available and how is the bot able to handle all of these different things. There are confirmations that you can give, there are negations that you can give, you can ask the bot to hold off, you can … There are several different things that can happen and things start getting very complicated very, very fast unless you actually have a solid dialoguing system in place to handle some of this stuff and while configuration is great, the reality is there are millions of ways you can actually take a dialog on and how do you build out the system in the most easiest of the ways while leveraging a lot of the data and using some sort of a machine learning technique to identify these different dialogs and the new answers between the dialogs.
(13:30) We have a lot of opportunity for that back and forth that machines can’t handle and then we’d wanna kind of knuckle down to those biggest existing, maybe simple dialog spaces.
MM: Exactly, exactly. One of the ways we can actually start thinking about it is every message that comes in from the user you can map it to a certain intent, you can map it to a certain label, like in a dialog app is what we call, and you can extract entities out of it.
Now, when you have these three, and historical state of course, how much data do you have from a machine learning perspective for each one of these states? So, if you can just take a look at some of the customer service interactions, simple stuff life booking air travel, I’m looking for a flight to get from San Francisco to Seattle starting Monday, can you give me the best options? Now, there are few things that you have missed in this. Do you want a morning flight or an evening flight? Is it a one way or a two-way trip?
Now, the system has to identify what are the missing components, ask the user for filling them in and the user can come back and say, “I don’t care whether it’s morning or evening.” And the bot has to understand that, did the user confirm a time or did the user just inform about a preference? What is the subtle difference between confirmation, affirmation, information and handle it accordingly?
And none of these can actually be configured because there are just too many ways of doing the same thing. You’d need to have some data using some customer service introduction dialogs, the historical stuff, look at what sort of patterns are there today that we can actually leverage and then identify what are the dialogs that we can actually build up. And as important as it is to find the dialogs that can be supported, it’s equally important to find out what cannot be supported.
So, the bot cannot handle everything. And the technology, the systems are just not there today to identify every single person’s use case, the nuances, and so we really need to know where we have the most amount of confidence on and where we cannot, where the bot cannot handle that with enough confidence it has to have escape valve and figure out a way how to transfer the customer to a live agent that can actually help them because the last thing the bot can do is to annoy the customer.
And we’ve all been in places like that where we have all worked with the IVR system where it just keeps saying, press 1 for this, press 2 for this and I keep pressing 0, I want to talk to a customer agent because none of these satisfy what I’m looking for and we need to have those escape hatch so that where we have the confidence, we try to promote that to the customer, where we do not have the confidence, we need to figure out a way before the customer gets frustrated, hand them off to the customer service agents.
And one thing that we’d always talk to of our customers is none of this happens on day 1. There is no magic on day 1. It is the amount of data that we keep capturing and it is the learning that happens over a period of time which enables us.
So, for one of our big customers, on day 1, we were handling only around 30% of all the messages, only 30%. 70% of them were still going off to the live agent and we kept evolving the bot, we kept adding the new use cases, we kept adding the new dialogs and today we are handling around 85% of all the messages and the call volume has gone up, right?
So, 85% of a much bigger call volume than 30% of a much smaller volume, the only difference between these 2 is looking at the data, getting the learning going, the self-learning aspects of it, and keep optimizing the bot.
It takes time to build up the stuff and the beauty of this is because we always have the escape hatch, it’s attunable. We can always determine where the customer is not happy and keep pushing them off to the live agent while learning from the live agent interaction and bringing it back to the bot.’
(21:00) What are some of those cool things that you see looking forward that you consider being pretty viable that maybe people should understand?
MM: So, one of the things evolutionary wise, most of the car companies have committed to integrating were either the Alexa or the Siri or the Google Voice by 2020, within next 2 to 3 years. Now, that opens up a huge opportunity for most of us.
One of the things that we think is the car is one of the most private places you are in. Most of the times, you’re traveling alone, you don’t mind interacting with the bot when you’re in a car, you don’t mind having voice-based transactions so you can say, “Hey, read me my emails,” or, “Can you set up an appointment with this person at 2 pm today,” or even doing some of the banking transactions. You don’t mind doing it one because it’s a private environment and it’s safe. It’s actually safer than your house because in the house you may have all sorts of people, like your own family, listening on some of the stuff, which you may not be comfortable with but in a car, when you’re driving, you don’t mind doing it.
So, that is one of the things that we’re seeing and the amount of possibilities, tracking a package, you don’t mind doing it in a car, you can basically call up FedEx and say, “Hey, can you delay my delivery from 2 pm today to 5 when I’m back at home?” You don’t mind doing some stuff like this, or ordering a latte on your way to work or ordering a burger.
All of these scenarios are things that we can actually automate and we totally imagine a few years from now evolutionary wise you’ll be doing this stuff. We are already doing that stuff with our phone and apps today, we can start doing this stuff in the car using voice and both of our hands are still on the steering wheel and we’re still not distracted and it is safe for all of us to enable that.
Now, it is important to the businesses that their employees who are driving around these trucks, they keep both of their hands on the steering wheel, it is super important for them. And they’ve actually taken a lot of steps to ensure that. Their voice is free, the voice is available and these drivers, the delivery guys, they can still keep interacting with the businesses about rerouting, about having a delivery or a missed delivery or any changes that happens to their routing, they can keep interacting with the business and the bot to enable some stuff.
The new workflows will actually start getting created because now all of us, our own voice is enabled in all of these cars and automobiles. Now, in addition to that there is a revolutionary piece. What more can you do with voice? We have already seen the rise of personal assistants, the Alexas, Google Homes, it’s so much more easy to enable a voice-based system in your house today.
So we’re also seeing a lot of the refrigerators, kitchen appliances come in which can actually start enabling some workflows that you would not have imagined today. The refrigerator can come in tomorrow and say, “Hey, you’re running out of milk. Should I just reorder it?” These are systems that we can imagine a few years from now and you are actually interacting with these appliances or the bots to start making your life simpler, so all the mundane things that you’re doing in your house today, those are all areas that we can easily automate.
So, there is a lot of evolutionary and revolutionary stuff coming in. We are super excited about the evolutionary parts. I think three to five years from now we will start looking at voice as a key enabler of a lot of these workflows.
Subscribe to our AI in Industry Podcast with your favorite podcast service:
Header Image Credit: The Verge