中国传媒大学学报自然科学版

基于大语言模型与视觉语言模型的多模态事实核查

Study on the impact of digital finance and technological innovation on regional economic development

投稿时间：

2024/8/20 0:00:00

DOI：

中文关键词：

深度学习；大语言模型；视觉语言模型；多模态；事实核查

英文关键词：

deep learning; large language models (LLM); vision language models (VLM); multimodal; fact-checking

基金项目：

国家重点研发计划（2021YFC3320103）；媒体融合与传播国家重点实验室（中国传媒大学）开放课题（SKLMCC2022KF002）；国家自然科学基金（62272460）；北京市自然科学基金（4232037）

姓名	单位
张芃芃	北华航天工业学院遥感信息工程学院
彭勃	中国科学院自动化研究所模式识别实验室
董晶	中国科学院自动化研究所模式识别实验室
程皓楠	中国传媒大学媒体融合与传播国家重点实验室

点击数：913

下载数：1142

中文摘要：

多模态事实核查旨在联合多种模态的媒体内容以抽取有效信息来检测社交媒体背景下的虚假信息。针对已有研究对事实核查领域专用数据集过于依赖以及在图像理解和语义相似度计算方面可解释性弱的问题，提出了一种全新的基于预训练大模型的多模态事实核查自动化方法，并在公开数据集COSMOS上进行了实验。结果表明该方法达到了0.859的正确率，且在每次核查时都能提供清晰的理由，相较于传统的基线方法具有更高的准确性和更强的可解释性。此外，还深入分析了不同的方法变体，以及数据集中各种虚假信息的判别场景，验证了本方法凭借在多模态信息语义理解方面的强大能力，可以灵活应对不同情境下的脱离上下文(OOC, out-of-context)检测。本文方法为社交网络中多模态媒体内容的事实核查工作提供有力的技术支持和新的思考方向。

英文摘要：

Multimodal fact-checking aims to combine multimodal media content to extract valid information to detect false information in the context of social media. Aiming at the problems of over-reliance on domain-specific datasets for fact verification and weak interpretability in terms of image understanding and semantic similarity comparison in existing studies, this paper proposed a novel automated multimodal fact-checking method based on a pre-trained large model and conducted exhaustive experiments on the publicly available dataset COSMOS. The results show that the method achieves an accuracy of 0.859 and provides clear justifications in every verification, which provides higher accuracy and stronger interpretability compared to traditional baseline methods. In addition, this paper also deeply analyzed different method variants and various false information discrimination scenarios in the dataset, verifying that this method can flexibly cope with out-of-context (OOC) detection in different contexts by the strong capability in semantic understanding of multimodal information. With the continuous progress of large model technology in the future, the method proposed in this paper will show more excellent performance in the field of fact-checking, which provides strong technical support and a new idea for fact-checking of multimodal media content in social networks.

参考文献：