中国传媒大学学报自然科学版

语义融合的革命文物图像以文标图算法研究

Image-text annotation for revolutionary cultural relics based on semantic fusion

投稿时间：

2022/8/20 0:00:00

DOI：

中文关键词：

革命文物; 多模态语义融合; 关键词提取; 命名实体识别

英文关键词：

revolutionary cultural relics; multimodal semantic fusion; keyword extraction; named entity recognition

基金项目：

揭榜挂帅重点研发课题（课题编号：2021YFF0901701）

姓名	单位
郭轩	北京邮电大学计算机学院
彭宏	文旅部民族民间文化中心
魏莱	北京邮电大学计算机学院

点击数：522

下载数：658

中文摘要：

革命文物蕴含着丰富的红色文化和光荣历史，具有重要的研究价值和传承意义。但目前对革命文物的梳理和解读仍缺乏数字化的方法。基于革命文物的多模态数据组织形式，本文提出了一种全新的面向革命文物图文数据的领域化、标签化和结构化方法——“以文标图”，对革命文物进行数字化标注。针对革命文物图像标签化问题，构建多模态语义融合模型提取图像标签，使用多特征TF-IWF方法提取文本标签，最后基于标签语义相似度对标签重排序，得到图像相关性高、信息粒度细的图像标签。针对革命文物图文数据结构化，构建图文模态融合模型帮助数据结构化，并将传统的基于序列标注的命名实体识别方法转化为属性名预测和属性值预测两部分。本文算法实现了图文模态的语义信息互补，提高了图文数据标签化和结构化的效果，为革命文物信息标注和解读提供了技术路径。

英文摘要：

Revolutionary cultural relics contain rich red culture and glorious history, and have important research value and inheritance significance. However, there is still a lack of digital methods for the sorting and interpretation of revolutionary cultural relics. Based on the multimodal data form of organization of revolutionary cultural relics, we propose a innovative method of territorialization, labeling and structuring for revolutionary cultural relic image -- "image-text annotation", which digitally label revolutionary cultural relics.For the problem of image tagging of revolutionary cultural relics, a multi-modal semantic fusion model is constructed to extract image tags, and the multi feature TF-IWF method is used to extract text tags. Finally, the tags are reordered based on the semantic similarity of them, and the tags with high correlation of image and fine granularity of information are obtained. For structuring of the image and text data of revolutionary cultural relics, a image modal and text modal fusion model is constructed.The traditional named entity recognition method based on sequence annotation is transformed into two parts: attribute name prediction and attribute value prediction. The algorithm in this paper realizes the complementation of semantic information between image modal and text modal, improves the result of labeling and structuring of the image and text data,and provides a technical path for the annotation and interpretation of revolutionary cultural relics information.

参考文献：