Microsoft’s need to make its voice louder in conversational artificial intelligence (AI) has driven its acquisition of Semantic Machines, announced this week. Semantic Machines focuses on understanding conversations, not just phrases, to support full conversations in voice or text. It has speech recognition technology and natural language generation (NLG) technology to communicate with the user in the right context. The technology is language independent, uses deep learning and reinforcement learning and the company had been building a large-scale training corpus for spoken and written dialogue.
In addition to no longer having its own mobile OS, Microsoft hasn’t pursued the consumer smart speaker model like Amazon, Google and, most recently, Apple have (although it does have partners making such devices using its software, such as Xiaomi). All this leaves Microsoft in need of other ways to find an edge in the conversational AI market.
However, Microsoft has certainly been making plenty of research breakthroughs in the area (see photo top right), such when a team at Microsoft Research Asia in Beijing reached the human parity milestone using the Stanford Question Answering Dataset in March and in April, when it claimed to have enabled full duplex conversation with XiaoIce, its AI-powered chatbot that is popular in China (see photo on lower right for a demo of XiaoIce at a Microsoft event in London I attended this week). Google got lots of publicity when it recently showed something similar at its I/O conference, in the limited domain of booking a hairdressing appointment.
The fact that both those Microsoft research breakthroughs came in China might have had something to do in part at least with the decision to buy Semantic Machines, based in Berkeley, CA, where it will now establish a conversational AI center of excellence.
Another reason was the talent, including Larry Gillick, former chief speech scientist for Apple, working on Siri, UC Berkeley Professor, Dan Klein, and Stanford University Professor, Percy Liang, who created the core language AI technology behind Google Assistant. Semantic Machines CEO and co-founder Dan Roth also started VoiceSignal Technologies, which was acquired by Nuance Communications for $293m in May 2007. Gillick worked at Nuance, VoiceSignal and Dragon Systems for almost 25 years, so he and the others are steeped in how AI can help us both understand and communicate with humans using AI and machine learning.