What question did this study set out to answer?

The research aims to develop a system that synchronizes AI-generated voice and facial simulations with presentation slides.

March 13, 2026

Video Synchronization Method for Presentations in AI-Generated Speaker Voice and Facial Reconstruction

Key Points

The research aims to develop a system that synchronizes AI-generated voice and facial simulations with presentation slides.
Proposes a presentation system based on pre-trained AI models
Utilizes special characters in input text to trigger voice and slide transitions
Implements synchronization through real-time processing
Achieved average synchronization delays of 282.17ms in experiment 1
Achieved average synchronization delays of 316.39ms in experiment 2
Demonstrated effective integration of audio-visual components in presentations

Abstract

인공지능(AI) 기술은 화자의 음성과 얼굴을 모사하는 기술뿐만 아니라, 자동화된 프레젠테이션 생성에도 적용되고 있다. 기존 연구들은 텍스트와 이미지를 기반으로 슬라이드를 자동 생성하는 데 초점을 맞추었으나, 발표자의 음성과 얼굴모사를 프레젠테이션 슬라이드와 실시간으로 동기화하는 연구는 부족하였다. 본 논문에서는 사전에 학습된 화자의 음성과 얼굴을 갖는 합성영상을 생성하고, 이를 프레젠테이션 슬라이드와 동기화하는 프레젠테이션 시스템을 제안한다. 본 시스템은 입력 텍스트 내 특수문자를 통해 특정 단어가 발화되는 시점에 슬라이드 텍스트에 효과를 적용하는 방식으로, 실험 1에서 평균 282.17ms, 실험 2에서 평균 316.39ms의 지연시간으로 동기화를 구현하였다.

Bookmark

Video Synchronization Method for Presentations in AI-Generated Speaker Voice and Facial Reconstruction

Key Points

Abstract

Cite This Study