医学研究与教育 ›› 2024, Vol. 41 ›› Issue (4): 30-37.DOI: 10.3969/j.issn.1674-490X.2024.04.005

• 易水学派研究 • 上一篇    下一篇

基于自然语言处理的易水学派文本挖掘与句法分析图谱构建研究

赵汉青, 李玥函, 邹欣妍   

  1. 河北大学中医学院, 河北 保定 071000
  • 收稿日期:2024-06-04 出版日期:2024-08-25 发布日期:2024-08-25
  • 作者简介:赵汉青(1990—),男,山东枣庄人,讲师,博士,硕导,主要从事中医药大数据处理与人工智能应用研究。 E-mail: zhaohq@hbu.edu.cn
  • 基金资助:
    国家自然科学基金(82004503);河北省高等学校科学技术研究项目资助(BJK2024108)

  • Received:2024-06-04 Online:2024-08-25 Published:2024-08-25

摘要: 自然语言处理中,实体与关系抽取是构建知识图谱、设计问答系统、语义分析等任务中不可或缺的环节。中医易水学派的信息多数以非结构化文言文本形式储存,中医文本关键信息抽取对挖掘和研究中医学术流派有重要作用。为了更高效地解决以上问题,研究引入人工智能方法,构建自然语言处理技术架构下基于条件随机场的分词和实体关系抽取模型识别与抽取中医文本实体关系,利用词频-逆文档频率算法的常用加权技术提取不同古籍文本中的关键实体信息,并使用基于人工神经网络依存句法分析技术,深入剖析古籍条文,以揭示其中实体之间复杂而精确的语法关系,将其表示为可视化树形结构,为下一步构建易水学派知识图谱及利用人工智能方法开展中医学术流派研究奠定基础。

关键词: 自然语言处理, 知识图谱, 易水学派, 句法分析

Abstract: Entity and relationship extraction is a crucial component in natural language processing tasks such as knowledge graph construction, question answering system design, and semantic analysis. The information pertaining to Yishui school of traditional Chinese medicine primarily exists in the form of unstructured classical Chinese text, making key information extraction from TCM texts essential for mining and studying TCM academic schools. To efficiently address these challenges using artificial intelligence methods, this paper presents a word segmentation and entity relationship extraction model based on conditional random field within the framework of natural language processing technology to identify and extract entity relationships from TCM texts. Important key entity information from different ancient books is extracted using commonly employed TF-IDF information retrieval and data mining weighting techniques. Additionally, grammatical relationships between entities in each ancient book article are analyzed using a neural network dependency parsing analyzer, which are then represented as tree structures for visualization purposes. This paper lays the foundation for subsequent steps involving building a knowledge graph for Yishui school and utilizing artificial intelligence methods to conduct research on TCM academic schools.

Key words: natural language processing, knowledge graph, Yishui school, syntactic analysis

中图分类号: