英文摘要:
With the rapid advancement of VR and AR technologies, there is a growing demand for immersive experiences. At the same time, virtual face technology is also becoming mature. Based on this, this paper explores the integration of highly realistic virtual faces into VR/AR to enhance the naturalness and immersion of user experience. However, in the field of virtual digital human, image generation and face-swapping techniques still encounter many challenges in VR/AR environments, especially the lip-synthesis model needs to be further optimised in dynamic scenes and multi-language environments. To solve the above problems, this paper proposes the VR/AR-AdaptFace model, an adaptive multimodal face replacement scheme for virtual reality and augmented reality. The model consists of two major modules: the "text-to-image" module, which uses advanced text-to-image conversion techniques and category-specific a priori retention strategies to optimise virtual face generation, and significantly improves the image quality through the attention mechanism; and the "speech-to-lip reflection" module, which The "speech-lip reflection" module, on the other hand, relies on a powerful generator, lip synchronisation discriminator and visual quality discriminator to achieve accurate synchronisation between speech and lip shape, bringing a more realistic experience for dynamic interaction in VR/AR scenes.
Keywords:Face synthesis; detail-enhanced modelling; Motion Video Lip Synthesis; virtual reality; augmented reality
|