Scaling up Labeling, Mining, and Inferencing on Event Extraction

Liang, Yan

dc.contributor.advisor	Grant, Christan
dc.contributor.author	Liang, Yan
dc.date.accessioned	2022-05-06T19:27:20Z
dc.date.available	2022-05-06T19:27:20Z
dc.date.issued	2022-05
dc.identifier.uri	https://hdl.handle.net/11244/335586
dc.description.abstract	Numerous important events happen every day and are reported in different media sources with varying narrative styles across different knowledge domains and languages. Detecting the real-world events that have been reported from online articles and posts is one of the main tasks in event extraction. Other tasks include identifying event triggers and trigger types, identifying event arguments and argument types, clustering and tracking similar events from different texts, event prediction, and event evolution. As one of the most important research themes in natural language processing and understanding, event extraction has wide applications in diverse domains and has been intensively researched for decades. This work targets a scaling-up of End-to-End event extraction task through three ways. First, scaling up the event labeling process to different languages and domains. We designed and implemented four approaches to accurately and efficiently produce multi-lingual labels for events. Using the approaches we developed, we were able to complete Arabic actor and verb dictionaries with coverage equivalent to English in less than two years of work, compared to two decades for English dictionary development. Second, scaling up event extraction by using the document topics information in a topic-aware deep learning framework. We propose a domain-aware event extraction method by using the topic name embeddings to enrich the sentences' contextual representations and multi-task setup of event extraction and topic classification task. With the topic-aware model we developed, we were able to improve F1 by 1.8% on all event types, and F1 by 13.34% on few-shot event types. Third, scaling up event extraction by designing containerized and efficient pipelines, which researchers can comfortably adopt. The pipeline has a container-based architecture that adapts to the available systems and load to process text. With the Kalman filter based batch size optimization, we were able to achieve 20.33% improvement on processing time compared to static batch size. Using the pipeline we developed, we were able to publish largest machine-coded political event dataset covering 1979 to 2016 (2TB, 300 million documents).	en_US
dc.language	en_US	en_US
dc.subject	Machine Learning	en_US
dc.subject	Information Extraction	en_US
dc.subject	Natural Language Processing	en_US
dc.subject	Text Mining	en_US
dc.title	Scaling up Labeling, Mining, and Inferencing on Event Extraction	en_US
dc.contributor.committeeMember	Fagg, Andrew
dc.contributor.committeeMember	Lu, Kun
dc.contributor.committeeMember	Hougen, Dean
dc.contributor.committeeMember	Cheng, Qi
dc.date.manuscript	2022-05
dc.thesis.degree	Ph.D.	en_US
ou.group	Gallogly College of Engineering::School of Computer Science	en_US
shareok.orcid	0000-0002-1192-7288	en_US
shareok.nativefileaccess	restricted	en_US

Files in this item

Name:: 2022_Liang_Yan_Dissertation.pdf
Size:: 2.920Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

OU - Dissertations [9330]

Show simple item record

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory

Scaling up Labeling, Mining, and Inferencing on Event Extraction

Files in this item

This item appears in the following Collection(s)