主管:中华人民共和国应急管理部
主办:应急管理部天津消防研究所
ISSN 1009-0029  CN 12-1311/TU

消防科学与技术 ›› 2023, Vol. 42 ›› Issue (11): 1529-1534.

• • 上一篇    下一篇

基于BERT-CRF模型的火灾事故案例实体识别研究

关斯琪1,2,3, 董婷婷1,2,3, 万子敬1,2,3, 何元生1,2,3   

  1. (1. 应急管理部天津消防研究所,天津 300381;2. 工业与公共建筑火灾防控技术应急管理部重点实验室,天津 300381;3. 天津市消防安全技术重点实验室,天津 300381)
  • 出版日期:2023-11-15 发布日期:2023-11-15
  • 作者简介:关斯琪(1994- ),女,陕西渭南人,应急管理部天津消防研究所助理研究员,主要从事计算机技术方面的研究工作,天津市南开区卫津南路110号,300381。
  • 基金资助:
    应急管理部天津消防研究所基科费项目(2022SJ22,2023SJ08)

Fire accident case named entity recognition based on BERT-CRF model

Guan Siqi1,2,3, Dong Tingting1,2,3, Wan Zijing1,2,3, He Yuansheng1,2,3   

  1. (1. Tianjin Fire Science and Technology Research Institute of MEM, Tianjin 300381, China; 2. Laboratory of Fire Protection Technology for Industry and Public Building, Ministry of Emergency Management, Tianjin 300381, China; 3. Tianjin Key Laboratory of Fire Safety Technology, Tianjin 300381, China)
  • Online:2023-11-15 Published:2023-11-15

摘要: 为实现火灾事故调查档案的关键信息抽取,提出一种基于BERT-CRF模型的文本命名实体识别方法。通过对161篇事故报告进行实体标注及数据增强,构建了火灾事故文本语料集;基于BERT预训练模型,对语料集中的句子序列进行双向特征提取,深度挖掘事故文本上下文的语义信息;结合CRF模型,充分考虑标签转移规则,对关键实体进行预测。试验表明:本文方法在火灾事故案例实体识别任务中的精确率、召回率以及F1值分别为76.36%、86.19%、80.97%,优于BERT和BERT-BiLSTM-CRF模型,且训练时长较BERT-BiLSTM-CRF模型缩短61 s。本文方法可为火灾调查知识库、案卷编制等下游系统提供准确的实体构建服务。

关键词: 命名实体识别, BERT-CRF, 火灾事故, 消防信息, 火灾事故调查档案, 语料集, 火灾事故文本

Abstract: Aiming at the key information extraction from fire accident investigation files, we propose a BERT-CRF based named entity recognition method for obtaining information such as accident location, cause and effect, safety measure, etc. We firstly construct a fire accident text corpus by annotating 161 accident reports and using a data augmentation method on the labeled data. Then according to BERT pre?training model, the bidirectional feature extraction method is performed on the sentence sequences in the corpus. Extracted information of the accident text context is further predicted as the key entities by CRF model and the entity label transfer rules. Experiments show that the precision, recall and F1 values of the BERT-CRF model in the fire accident case named entity recognition task are 76.36%, 86.19%, and 80.97%, respectively, which are better than BERT and BERT-BiLSTM-CRF models, and the training time is 61 seconds shorter than that of model BERT-BiLSTM-CRF. Our final model can provide accurate entity construction services for downstream systems such as fire investigation knowledge base and file compilation.

Key words: named entity recognition, BERT-CRF, fire accident, fire information, fire investigation file, text corpus, fire accident text