Show simple item record

dc.contributor.advisorGrant, Christan
dc.contributor.authorLiang, Yan
dc.date.accessioned2022-05-06T19:27:20Z
dc.date.available2022-05-06T19:27:20Z
dc.date.issued2022-05
dc.identifier.urihttps://hdl.handle.net/11244/335586
dc.description.abstractNumerous important events happen every day and are reported in different media sources with varying narrative styles across different knowledge domains and languages. Detecting the real-world events that have been reported from online articles and posts is one of the main tasks in event extraction. Other tasks include identifying event triggers and trigger types, identifying event arguments and argument types, clustering and tracking similar events from different texts, event prediction, and event evolution. As one of the most important research themes in natural language processing and understanding, event extraction has wide applications in diverse domains and has been intensively researched for decades. This work targets a scaling-up of End-to-End event extraction task through three ways. First, scaling up the event labeling process to different languages and domains. We designed and implemented four approaches to accurately and efficiently produce multi-lingual labels for events. Using the approaches we developed, we were able to complete Arabic actor and verb dictionaries with coverage equivalent to English in less than two years of work, compared to two decades for English dictionary development. Second, scaling up event extraction by using the document topics information in a topic-aware deep learning framework. We propose a domain-aware event extraction method by using the topic name embeddings to enrich the sentences' contextual representations and multi-task setup of event extraction and topic classification task. With the topic-aware model we developed, we were able to improve F1 by 1.8% on all event types, and F1 by 13.34% on few-shot event types. Third, scaling up event extraction by designing containerized and efficient pipelines, which researchers can comfortably adopt. The pipeline has a container-based architecture that adapts to the available systems and load to process text. With the Kalman filter based batch size optimization, we were able to achieve 20.33% improvement on processing time compared to static batch size. Using the pipeline we developed, we were able to publish largest machine-coded political event dataset covering 1979 to 2016 (2TB, 300 million documents).en_US
dc.languageen_USen_US
dc.subjectMachine Learningen_US
dc.subjectInformation Extractionen_US
dc.subjectNatural Language Processingen_US
dc.subjectText Miningen_US
dc.titleScaling up Labeling, Mining, and Inferencing on Event Extractionen_US
dc.contributor.committeeMemberFagg, Andrew
dc.contributor.committeeMemberLu, Kun
dc.contributor.committeeMemberHougen, Dean
dc.contributor.committeeMemberCheng, Qi
dc.date.manuscript2022-05
dc.thesis.degreePh.D.en_US
ou.groupGallogly College of Engineering::School of Computer Scienceen_US
shareok.orcid0000-0002-1192-7288en_US
shareok.nativefileaccessrestricteden_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record