中国传媒大学学报自然科学版

VR/AR-AdaptFace:面向虚拟现实与增强现实的自适应多模态面部替换模型

VR/AR-AdaptFace:Adaptive Multimodal Face ReplacementModel for Virtual Reality and Augmented Reality

投稿时间：

2024/8/20 0:00:00

DOI：

中文关键词：

人脸合成;细节增强模型;动态视频唇形合成;虚拟现实;增强现实

英文关键词：

Face synthesis; detail-enhanced modelling; Motion Video Lip Synthesis; virtualreality; augmented reality

基金项目：

姓名	单位
勒聪	中国传媒大学信息与通信工程学院
周满玲	中国传媒大学信息与通信工程学院
林美秀	中国传媒大学信息与通信工程学院
张佳一	中国传媒大学广告学院
王晶	北京理工大学信息与电子学院
刘淼	北京理工大学信息与电子学院

点击数：597

下载数：703

中文摘要：

随着VR/AR技术的迅猛进步，用户对于沉浸式体验的需求日益增长。同时，虚拟人脸技术亦趋成熟。基于此，本文探索将高度拟真的虚拟人脸融入VR/AR，以增强用户体验的自然度与沉浸感。然而，在虚拟数字人领域，图像生成及换脸技术在VR/AR环境下仍遇诸多挑战，尤其是唇形合成模型在动态场景及多语言环境下的性能需进一步优化。为解决上述问题，本文提出VR/AR-AdaptFace模型，一个面向虚拟现实与增强现实的自适应多模态面部替换方案。该模型由两大模块构成：“文颜绘真”模块，采用先进的文本至图像转换技术和特定类别先验保存策略，优化虚拟人脸生成，并通过注意力机制大幅提升图像质量；“语唇映生”模块，依托强大的生成器、唇形同步判别器及视觉质量判别器，实现语音与唇形的精准同步，为VR/AR场景中的动态交互带来更加逼真的体验。

英文摘要：

With the rapid advancement of VR and AR technologies, there is a growing demand for immersive experiences. At the same time, virtual face technology is also becoming mature. Based on this, this paper explores the integration of highly realistic virtual faces into VR/AR to enhance the naturalness and immersion of user experience. However, in the field of virtual digital human, image generation and face-swapping techniques still encounter many challenges in VR/AR environments, especially the lip-synthesis model needs to be further optimised in dynamic scenes and multi-language environments. To solve the above problems, this paper proposes the VR/AR-AdaptFace model, an adaptive multimodal face replacement scheme for virtual reality and augmented reality. The model consists of two major modules: the "text-to-image" module, which uses advanced text-to-image conversion techniques and category-specific a priori retention strategies to optimise virtual face generation, and significantly improves the image quality through the attention mechanism; and the "speech-to-lip reflection" module, which The "speech-lip reflection" module, on the other hand, relies on a powerful generator, lip synchronisation discriminator and visual quality discriminator to achieve accurate synchronisation between speech and lip shape, bringing a more realistic experience for dynamic interaction in VR/AR scenes. Keywords：Face synthesis; detail-enhanced modelling; Motion Video Lip Synthesis; virtual reality; augmented reality

参考文献：