Episode Summary: This week’s in-person interview is with Dr. Adam Coates, who spent 12 years at Stanford studying artificial intelligence before accepting his current position of Director of Baidu’s Silicon-Valley based artificial intelligence lab. We speak about his ideas around consumer artificial intelligence applications and impact and what he’s excited about, as well as what he thinks may be more ‘hype’ than reality. He gives a an idea about applications that Baidu is working, to potentially influence billions of mobile and computer users worldwide. If you’re interested in the developments of speech recognition and natural language processing, this is an episode you won’t want to miss.
Guest: Adam Coates
Expertise: Deep learning and feature learning; reinforcement learning
Recognition in Brief: Adam Coates received his BS, MS, and PhD in Computer Science from Stanford University before he was hired as director of the Silicon Valley AI lab (SVAIL) at Baidu, where he’s now been for two years. Adam has authored and co-authored numerous publications in the areas of speech recognition, deep learning, feature learning, and others, some of which have been featured in publications like Wired and CNet. While at Stanford, Adam worked on a number of projects, including the Stanford AI Robot (STAIR) and the Stanford Autonomous Helicopter.
Current Affiliations: Baidu USA
Adam Coates on AI Virtual Assistants and Speech Recognition
When you want to try and discern what’s real from what’s hype, you go to the source. Being right in the middle of Silicon Valley, there are plenty of sources in the world of artificial intelligence (AI) research and development, though it’s not always as simple as walking through the front door. This week, I was lucky enough to walk through Baidu’s doors and sit down with Adam Coates, the director of the Baidu’s Silicon Valley-based AI Lab. I first wanted to know what he thought was more hype than real news in the AI media.
“Based on a lot of the genuine progress that’s happening in AI right now, substantially because of big progress in deep learning and neural networks, many people are starting to feel that full artificial general intelligence (AGI) may be just around the corner…I think working with these technologies every day, it’s pretty clear that that’s just not where the progress is happening right now,” said Coates.
Adam illustrated his assessment for me in an analogy that helped clarify the often fear-hyped media tale from the actual, in-the-trenches story. If you look at automobiles or airplanes and the progress made up until today, said Coates, it’s obvious that we have much more efficient and safer cars; however, no one is looking at the latest Tesla and worrying about it turning into a transformer and having a battle on the freeway amidst tomorrow morning’s commute.
We don’t have these thoughts because it’s self evident; it’s as simple as looking at the technology that’s available and knowing there’s no plausible way for this scenario to happen in any near future.
“I think deep learning is in this same space. It’s getting a lot better, we’re seeing things we didn’t think were possible a few years ago, but if you’re actually working with that technology, its very self-evident that we just don’t have the pieces to make full artificial intelligence at this point,” explained Coates.
Though we may not be able to talk philosophy or reason with our devices anytime soon, Adam is optimistic about what’s happening in the AI field today, particularly with applications in speech recognition and natural language. While Adam doesn’t pretend to have the answers for how to solve the issue of understanding artificial consciousness, he does think there’s one area that humans could reasonably solve over the next decade. The primary driver behind this potential solution is powered by the huge amount of labeled data that is now available to us.
“The speech recognition that we’ve been building in our AI lab works incredibly well because we can give it audio along with transcriptions and the neural network can learn from all those transcriptions to recognize speech; this is a kind of machine learning called supervised learning, but it’s very clear that this is not how humans learn to recognize speech,” said Coates.
True, we don’t play 10,000 hours of transcribed audio for our children, and learning from mistakes is often done without adult supervision, which is not the case with machines, (they require direct, correctly-labeled feedback before they can learn).
“One of the things we know humans are somehow accomplishing is unsupervised learning, we know that we’re taking in audio and visual data and learning to make sense of it in a way that help us very rapidly adapt to new tasks,” explained Adam.
Adam believes there’s a lot of cutting-edge research happening now in unsupervised learning, but it’s anyone’s guess as to when machines will really be able to take the wheel. “My sense is that we’re making real progress on the problem, but no one has really cracked it, we don’t know when the big watershed event is going to be,” Adam noted.
Speech Recognition, a Giant Leap for Machines
Looking ahead into the future can be both a wise and potentially dangerous strategy, and predicting outcomes is risky, what Adam calls “throwing darts.” But based on Baidu’s search engine capabilities and their recent leaps in speech recognition with their Deep Speech engine, I wanted to know what the average consumer’s life might look like in 5 to 10 years.
Adam stated that in general, technology users are already comfortable with texting and text queries, but there are lots of cases where this doesn’t make sense – like when you’re driving or want to do a long transcription. “If we can have speech recognition that’s as good as a person, we’re going to start being able to interact with our devices in a new way,” said Coates. We heard similar notions in our in-person visit to talk chatbots with the AI Lab director at Nuance Communications (arguably the largest and best known transcription and speech recognition firm at the time of this writing).
This technology opens pathways for more complex and connected domains, such as speech-enabled homes and cars. Coates noted that he is most excited for this next step, when we are interacting and connected with our devices and our appliances using natural speech. He stated that in largely “mobile societies”, which at present include China, Japan, and to a lesser extent the U.S. (largely the millennial generation), those who aren’t used to having a laptop or PC will have a totally different way of accessing the Internet and connecting with world.
Another area that Baidu is focused on developing is virtual assistants, especially for tasks that consume a lot of visual attention (like driving a car) and therefore require speech-enabled commands. “If we can really make speech very low overhead, in the sense that you don’t even have to think about whether you would use it or not, we could take that away,” Adam explained.
“This is about really changing our relationship with the devices we use and the way we get things done, and making them much faster and taking away all the training that we have to do for ourselves.”
If we can think back far enough, figuring out how to get what we wanted out of early Internet search engines took practice and refining the digital interface. Now, most of us perform keyword searches almost automatically. Baidu’s goals include cutting away similar barriers to make speech-enabled technologies just as seamless and useful. Perhaps in the not too-distant future, we’ll swap uploading data and manually manipulating our devices for a spoken introduction, with machines listening to our desires and needs and responding intelligently.