When Eugene Fiume visits his elderly mother, she’ll often point to photographs and say the names of those who appear within the frames.
While dementia may have diminished her ability to communicate, it hasn’t diminished her need for human interaction, according to Simon Fraser University’s dean of the Faculty of Applied Science.
It’s Fiume’s hope technology he has helped develop to enhance facial expressions in digital characters featured in video games can soon enhance personal experiences for those like his mother.
“Think about how a face that such folks could recognize — they can speak to them, even in the voice of other people — could become a companionship tool. Now these sorts of things have to be designed really carefully,” the professor at SFU’s School of Computing Science said, referring to the sensitive nature of such a tool and the need to recruit experts in social sciences and caregiving to navigate such issues.
“But I do see a lot of potential in those kinds of spaces where care workers can’t be there constantly, but perhaps a digital human could be available.”
For now Fiume, the co-founder of Jali Research Inc., gets to witness technology developed by his company deployed in one of the biggest video game releases of 2020, Cyberpunk 2077.
The action role-playing games features hundreds of speaking characters, each rendered to appear like photorealistic people when they speak.
Jali (as in Jaw And Lip Integration) specializes in technology that helps turn a vocal performance into a visual performance in animated form.
When a performer delivers dialogue to game developers, Jali’s tools can take those spoken words along with text to create facial animations that accurately lip syncs to the voice.
The complexity of Jali’s software becomes apparent when considering how different languages have different effects on faces.
German and Polish, for instance, have stronger consonant sounds compared with Japanese.
“We all have the same facial anatomy and yet the way in which the language gets expressed on the face is very different with respect to each language and with respect to the culture,” Fiume said.
“In less expressive languages like Japanese, the face will move a little bit less than the mouth because of the nature of the enunciation of those of those phonemes [distinct sounds in a language], and those phonemes turn into mouth movements. So we could train the system on a variety of languages and that's one of the things that we find very interesting.”
So far the Jali tool can be applied to 10 different languages.
Fiume, who describes his own background as specializing in the mathematical side of creating beautiful images, recalls that the technology originally came about as a science experiment conducted by him, a PhD student and two other colleagues in the mid-2010s.
The foursome were trying to figure out how to reduce the hundreds of variables that influence movements in the face — otherwise known as the degrees of freedom in the face.
“We were able to reduce a good part of it to components in the jaw motion and the lip motion. It sounds kind of obvious, but it's quite remarkable how much content can be kind of encoded in the jaw and lip,” Fiume said.
The findings were presented at the SIGGRAPH conference in 2016, catching the eyes of Polish gaming developers who were trying to enhance the lip syncing for their The Witcher 3: Wild Hunt title.
From there, those same developers deployed the Jali tools for their work on Cyberpunk 2077, which stars a lifelike rendering of actor Keanu Reeves, voice and all.
“There’s this big ‘whoop’ when you first see the results and it looks pretty good,” Fiume said.
So it's very exciting to see. We want to see applications in a much wider set of spaces beyond games.”
Beyond digital avatars that could keep the company of residents in care homes, he also envisions applications in language training and assistance for those who struggle with recognizing faces.
One question often facing Fiume is whether such an advancement in technology is meant to replace real actors.
“I see this as just enhancing very much what we already have, extending the range of expression and expressiveness of updatable content. And I really am an advocate. I've seen these kinds of more humanistic, performance-based technologies moving into things that provide social impact,” he said.
“I guess the punchline is: fear not.”