A Robot Learns to Lip Sync – YouTube


Watch On

Can you be certain the person talking to you is 100% absolutely not a robot? Soon, you might not be so sure.

For the first time, scientists have built a robot that can move its mouth exactly like a human. This means it avoids the so-called “uncanny valley” effect, where a bot’s actions appear unsettling because they are uncomfortably close to natural — but don’t quite meet that threshold.

The Columbia University researchers achieved the feat by allowing their robot, EMO, to study itself in a mirror. It learned how its flexible face and silicone lips would move in response to the precise actions of its 26 facial motors, each capable of moving in up to 10 degrees of freedom.

They outlined their methods in a study published Jan. 14 in the journal Science Robotics.

How EMO learned to move its face like a human

EMO uses an artificial intelligence (AI) system called a “vision-to-action” language model (VLA), meaning it can learn how to translate what it sees into coordinated physical movements without pre-defined rules. During training, the humanoid robot made thousands of seemingly random expressions and lip movements while it stared at its own reflection in the mirror.

Next, the scientists sat EMO in front of hours of YouTube videos showing humans talking in different languages and singing. This allowed it to connect its knowledge of how its motors produced facial movements to the corresponding sounds, all without any understanding of what was being said. Eventually, EMO was able to take spoken audio in 10 different languages and synchronize its lips near-perfectly.

“We had particular difficulties with hard sounds like ‘B’ and with sounds involving lip puckering, such as ‘W’,” Hod Lipson, an engineering professor and the director of Columbia’s Creative Machines Lab, said in a statement. “But these abilities will likely improve with time and practice.”

Many a roboticist has tried and failed to create a convincing humanoid, so before unveiling EMO to the world, it needed to be put to the test in front of real people. The scientists then showed videos of the robot speaking using the VLA model, and two other approaches for controlling its mouth, to 1,300 volunteers, — alongside a reference video demonstrating ideal lip motion.

The two other approaches were an amplitude baseline, in which EMO moved its lips based on the loudness of the audio, and a nearest-neighbor landmarks baseline, in which it mimicked facial movements it had seen others make that produced similar sounds. The volunteers were instructed to choose the clip that best matched the ideal lip motion, and they chose VLA for 62.46% of cases — compared to 23.15% and 14.38% for the amplitude and nearest-neighbor landmarks baselines, respectively.

Robot carers will require friendly faces

While there are differences across genders and cultures in how people distribute their gaze, humans in general rely heavily on facial cues when interacting with each other. A 2021 eye-tracking study found that we look at the face of our conversation partners 87% of the time, with roughly 10 to 15% of that time focused specifically on the mouth. Other research has shown that mouth movements are so important that they even affect what we hear.

The researchers believe that overlooking the face’s significance is part of the reason other projects have failed to create convincing robots.

“Much of humanoid robotics today is focused on leg and hand motion, for activities like walking and grasping,” Lipson said. “But facial affection is equally important for any robotic application involving human interaction.”

As AI technology continues to advance at a breakneck pace, robots are expected to take on an increasing number of roles that require direct interaction with humans, including in education, medicine and elderly care. This means their efficacy will correlate to how well they can match human facial expressions.

“Robots with this ability will clearly have a much better ability to connect with humans because such a significant portion of our communication involves facial body language, and that entire channel is still untapped,” said lead author of the study, Yuhang Hu, in the press release.

But his team is not the only one working on making humanoid robots more lifelike. In October 2025, a Chinese company released a video of an eerily realistic robot head, created as part of their effort to make interactions between people and robots feel more natural. The year before that, a Japanese team unveiled an artificial self-healing skin that could make robot faces look human.

Share.
Exit mobile version