中国传媒大学学报自然科学版

一种基于特征迁移的跨领域中文分词模型

A cross-domain Chinese word segmentation model based on feature transfer

投稿时间：

2021/6/20 0:00:00

DOI：

中文关键词：

迁移学习；对抗学习；正交约束；中文分词

英文关键词：

transfer learning; adversarial learning; orthogonal constraints; Chinese word segmentation

基金项目：

中国传媒大学中央高校基本科研业务费专项资金资助(3132018XNG1829)

姓名	单位
张韬政	zhangtaozheng@cuc.edu.cn
张家健	zhangjiajian@cuc.edu.cn

点击数：559

下载数：1217

中文摘要：

中文分词是自然语言处理的常见任务之一。在跨领域分词任务中，目标领域的数据分布不同及数据量不足通常导致分词效果急剧下降。基于该问题，本文通过引入了迁移学习、对抗学习和正交约束以减轻共享和私有特征之间的干扰，提出了一种基于特征迁移的跨领域中文分词模型，能够在跨领域和小数据量条件下，借鉴数据量较大的源领域的知识来进行学习，实验证明该模型最终获得了出色的表现。

英文摘要：

Chinese word segmentation is one of the common tasks in natural language processing. In cross-domain Chinese word segmentation tasks, the different distributions between two different domains and the lack of enough training data often result the low performance. For this problem, we propose a cross-domain Chinese word segmentation model based on feature transfer, which introduces transfer learning, adversarial learning and orthogonal constraints to reduce the interferences between shared and private features. This model can learn from the knowledge of source domain with large amount of data under the premise of small amount of data and cross- domain. Experimental results show that the scheme achieves excellent performance.

参考文献：