基于词汇增强的典型文物命名实体识别算法
A Lexicon Enhanced Named Entity Recognition Algorithm for Typical Cultural Relics
投稿时间: 2023/4/20 0:00:00
DOI:
中文关键词: 词汇增强;领域词库;命名实体识别
英文关键词: lexicon enhanced; domain thesaurus; named entity recognition;
基金项目: 国家重点研发计划课题“文化资源大数据服务工程方法与数据加工技术研究”(2021TFF0901701)
姓名 单位
崔鑫 北京邮电大学计算机学院
王琰 北京邮电大学人工智能学院
侯小刚 北京邮电大学人工智能学院
周月 北京邮电大学电子工程学院
点击数:678 下载数:905
中文摘要:

典型文物的命名实体识别主要从句子中提取出文物名称、朝代、出土地点、馆藏地等类别的实体。典型文物数据具有构词的特殊性,使用现有命名实体识别方法在典型文物数据集上会遇到词边界判断错误等问题。本文提出了一种基于词汇增强的典型文物命名实体识别算法,算法在输入表示层和上下文编码层引入词汇信息,提高了词语领域专业性。算法通过构建文物领域词库,将其作为基于词汇增强的典型文物命名实体识别算法词典,较好地解决了词边界判断错误问题,在典型文物数据集上取得了较好的效果。

英文摘要:

Named entity recognition of typical cultural relics focuses on extracting entities from sentences in categories such as name of cultural relic, dynasty, excavation site, and place of collection. The data of typical cultural relics has the specificity of word construction, and using existing named entity recognition methods on typical cultural relics dataset will encounter problems such as wrong word boundary judgments. The algorithm introduces lexical information in both the input representation layer and the contextual encoding layer to improve the word domain expertise. By constructing a lexicon of heritage domain words, the algorithm is used as a lexicon for the lexically enhanced recognition algorithm of typical heritage named entities, which eventually solves the problem of incorrect word boundary judgement and achieves better results on the typical heritage dataset.

参考文献: