英文摘要:
With the rapid development of generative artificial intelligence technology, social media platforms have become inundated with a plethora of deepfake audio synthesized using techniques such as speech synthesis and voice conversion. These deepfake audios, capable of producing highly natural and realistic voices, pose significant threats. To address this issue, numerous deepfake audio detection challenges have been organized globally, aiming to foster the development of the audio anti-spoofing field. Distinguishing from existing surveys which limited to the binary classification of whole audio authenticity, this article transcends traditional binary classification and provides a comprehensive summary of audio deepfake detection. Specifically, this article divides the domain of audio deepfake detection into three sub-domains: global deepfake audio detection, local deepfake audio localization, and deepfake audio source tracing, systematically reviewing and summarizing existing datasets, domain issues, and solution approaches in each sub-domain. Finally, this paper outlines the potential challenges facing the field of deepfake audio detection and offers prospects for future research, aiming to provide reliable reference for future researchers.
|