DEVELOPING AN ALGORITHM INTEGRATING VOICE AND IMAGING ANALYSIS TO RECOGNIZE FACIAL FEATURES AND DEFICIENCIES AFTER ORAL SURGERY

Seifi, Erfan

View/Open

2024_Seifi_Erfan_Thesis.pdf (14.40Mb)

2024_Seifi_Erfan_Thesis.docx (21.83Mb)

Date

2024-05-10

Author

Seifi, Erfan

Metadata

Show full item record

Abstract

According to the National Institute of Health (NIH), oral cancer is one of several major types of head and neck cancer (HNCs) and affects approximately 54,000 individuals in the United States each year. Recognized risk factors for HNCs are primarily tobacco use, alcohol intake, and inadequate oral hygiene, the latter of which is significant for oral cavity cancer. Like treatment for other cancers, oral cancer therapies usually include surgery, radiotherapy, chemotherapy or a combination thereof. Treatments can cause loss of clear speech as a result of resecting parts of the vocal tract, which alters the vocal tract shaping and/or limits mouth movement. For this thesis, a software application was developed to evaluate a participant’s spoken communication by simultaneously analyzing facial features and voice recordings of him or her reading a scripted passage. The effect of vocal tract changes following oral surgery was investigated using the new application, which showed measurable, quantifiable loss of speech. The goal of development and testing was providing medical doctors, speech therapists, and researchers the ability to leverage data-drive algorithms when designing strategic rehabilitation treatment plans to improve patient recovery. With the use of machine learning techniques, a model was developed for analyzing speech patterns and identifying/quantifying an emulated impact of oral surgery on generating speech. Such an approach leverages acoustic analysis and offers a non-invasive, accessible means of assessment, especially when compared to other methods (e.g., high-speed video-stroboscopy) that are known to cause side effects of swelling/pain and exclude some cancer patients. By focusing on extracting and analyzing various audio features from speech recordings and spatial dynamics of the lips—including formant frequencies—investigators are able to discern subtle changes in motor speech task characteristics. This information could indicate post-surgical complications or suggest improvements during recovery. The framework built in this thesis identifies a process for comparing speech samples and special facial dynamics both before and after surgery. Detecting impairments, like shift in speech frequencies, offers valuable feedback about a patient’s motor speech task monitoring and rehabilitation progress. Results demonstrate the effectiveness of therapeutic interventions after cancer treatment. Experimental analyses emulated possible post-surgical scenarios for two healthy participants. The first participant was a non-native English speaker and the second participant was a native English speaker with American accent. Various speech patterns were observed under both regular conditions and those experienced as a consequence of two types of oral obstructions. Preliminary results demonstrate the potential for using the novel method detailed herein for objectively assessing speech loss and monitoring speech rehabilitation for patients who suffer from oral cancer. In short, this thesis presents a framework for non-invasive assessment of speech impairments following oral cancer treatment that bridges the gap between clinical speech therapy and computational speech analysis. The impact will enhance oral health and surgery rehabilitation. This study was conducted under an approved IRB by the University of Oklahoma No. 17042 and title: AI For Facial Rehab Post Oral Surgery Speech Recovery.

URI

https://hdl.handle.net/11244/340341

Collections

OU - Theses [2188]

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory