It was the kind of a dinner party Ibrahim El Chami likes to throw: some steak and mashed potatoes topped off with an evening playing Settlers of Catan.
The one exception was that his apartment was decked out with cameras, microphones and even the Microsoft (Nasdaq:MSFT) Kinect device, all of which were monitoring everything he and his friends got up to for two and a half hours.
“We obviously noticed the equipment, so you kind of have to watch what you say,” said El Chami, who was working as a content co-ordinator for Globalme Localization Inc. when he signed up to host one of these parties for $400.
Those recording devices, in fact, came courtesy of Globalme, El Chami’s former employer.
Over the summer the Vancouver-based company arranged about two dozen similar dinner parties to collect mountains of data to help computers better understand human speech.
The initiative is part of the fifth annual CHiME challenge facilitated by the University of Sheffield in the U.K., where researchers are trying to recognize single distinct voices in large, busy settings such as a dinner party.
Speaking directly into a phone receiver while asking Apple’s (Nasdaq:AAPL) Siri or Amazon’s (Nasdaq:AMZN) Alexa to find a restaurant or dial a contact renders fairly consistent results.
A user speaks, and the computer compares those audio patterns to a large database and then converts the audio into text. From there, the text tries to guess what the user wants.
But Globalme is collecting data for more-complex algorithms that can dig through speech that is being muffled or obscured by multiple, simultaneous conversations.
“There are a lot of different ways of saying the same thing, which is why you need to collect that data,” said Emre Akkas, Globalme’s co-founder and chief technologist.
When the company launched in 2009 it was focused on helping companies get their products out in different languages to customers across the globe.
In the case of the dinner parties, a team from Globalme goes through the data and removes any sensitive information that may have been shared among the guests.
Akkas said he began recognizing the growing popularity of speech-recognition products in 2014. Since then, the company’s Vancouver head count has expanded to 25 people as it’s recruited more experts and developed more software focused on speech recognition.
“Voice is really still in its infancy,” said Novel Effect CEO Matt Hammersley. His Seattle-based company specializes in interactive entertainment that syncs special effects with a storyteller’s voice.
He said while some users may occasionally get frustrated if voice-recognition technology makes a mistake, the technology has already reached an inflection point for consumers.
“Accuracy is good enough for consumer adoption but software applications are just the low-hanging fruit,” he said.