Show simple item record

dc.contributor.advisorGrant, Christan
dc.contributor.authorElaryan, Andrew
dc.date.accessioned2022-08-01T15:52:26Z
dc.date.available2022-08-01T15:52:26Z
dc.date.issued2022-08
dc.identifier.urihttps://hdl.handle.net/11244/336450
dc.description.abstractPodcasting has rapidly ascended as one of the primary forms of spoken-word media in the 21st century. The Spotify Podcast Dataset has compiled transcripts of over 100,000 podcast episodes, making it one of the largest repositories of spoken word data. The segment retrieval task aims to find the most relevant segments to a given query from the set of episode transcripts. This thesis presents a two-stage approach to segment retrieval using an end-to-end question-answering (QA) deep learning architecture with an additional step to expand answers to segments. Standard BM25 retrieval on an index of predetermined segments from each episode serves as a baseline retrieval system. Experiments for both approaches involved producing and evaluating a ranked list of 20 relevant segments for 50 test topics. Comparison between the two retrieval methods shows that the QA retriever trails the baseline in nDCG@10 by 0.128, precision@10 by 0.184, and average segment relevance score by 0.461. QA retrieval slightly outperforms the baseline by 0.024 in recall@10 while slightly underperforming it by 0.102 in average segment relevance score when discounting irrelevant segments. The results suggest that the QA retrieval approach in this thesis can adequately identify and rank relevant segments within a relevant input text. However, for some queries, it may struggle to find enough relevant candidate documents during the first stage of retrieval. QA retrieval shows promise in handling informational queries for the user goal of answering a question. Future work includes improving processes such as candidate document retrieval, answer span expansion, and data annotation.en_US
dc.languageen_USen_US
dc.rightsAttribution 4.0 International*
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/*
dc.subjectquestion-answeringen_US
dc.subjectmachine learningen_US
dc.subjectpodcastsen_US
dc.subjectcomputer scienceen_US
dc.titleQuestion-Answering for Segment Retrieval on Podcast Transcriptsen_US
dc.contributor.committeeMemberHougen, Dean
dc.contributor.committeeMemberMcGovern, Amy
dc.date.manuscript2022-07-28
dc.thesis.degreeMaster of Scienceen_US
ou.groupGallogly College of Engineering::School of Computer Scienceen_US
shareok.orcid0000-0003-0957-5655en_US


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record


Attribution 4.0 International
Except where otherwise noted, this item's license is described as Attribution 4.0 International