CAT: A Collaborative Annotation Tool for Chinese Genealogy Textual Documents

摘要: The annotation for Chinese genealogy textual documents is helpful for constructing genealogy knowledge graph, training effective machine learning models for knowledge extraction, etc. However, this kind of documents is difficult to annotate. The primary reason is that the texts are written in both classical and vernacular Chinese. These texts also contain numerous ancient characters and are usually without punctuation. Understanding genealogy texts requires sufficient expertise. When multiple users labeling the same text, conflicts may occur. Existing annotation tools are inappropriate for this work. In this paper, we propose a novel interactive labeling tool, which provides text segmenting, entity and relationship tagging etc. With the annotated information, it is convenient to construct knowledge graph from textual documents, which can be used to analyze Chinese genealogy texts. Furthermore, we introduce a weak supervised mechanism with Hidden Markov Model for collaborative annotating with crowdsourcing. The practice shows that our approach is effective for collaborative annotation. It also facilitates the construction of knowledge graph and obtains more high-quality data sets. At present, this annotation tool has been applied into service.