Qiu, YuchenChen, Xuxin2022-05-042022-05-042022-05-13https://hdl.handle.net/11244/335507The computer-aided diagnosis (CAD) scheme is a powerful tool in assisting clinicians (e.g., radiologists) to interpret medical images more accurately and efficiently. In developing high-performing CAD schemes, classic machine learning (ML) and deep learning (DL) algorithms play an essential role because of their advantages in capturing meaningful patterns that are important for disease (e.g., cancer) diagnosis and prognosis from complex datasets. This dissertation, organized into four studies, investigates the feasibility of developing several novel ML-based and DL-based CAD schemes for different cancer research purposes. The first study aims to develop and test a unique radiomics-based CT image marker that can be used to detect lymph node (LN) metastasis for cervical cancer patients. A total of 1,763 radiomics features were first computed from the segmented primary cervical tumor depicted on one CT image with the maximal tumor region. Next, a principal component analysis algorithm was applied on the initial feature pool to determine an optimal feature cluster. Then, based on this optimal cluster, machine learning models (e.g., support vector machine (SVM)) were trained and optimized to generate an image marker to detect LN metastasis. The SVM based imaging marker achieved an AUC (area under the ROC curve) value of 0.841 ± 0.035. This study initially verifies the feasibility of combining CT images and the radiomics technology to develop a low-cost image marker for LN metastasis detection among cervical cancer patients. In the second study, the purpose is to develop and evaluate a unique global mammographic image feature analysis scheme to identify case malignancy for breast cancer. From the entire breast area depicted on the mammograms, 59 features were initially computed to characterize the breast tissue properties in both the spatial and frequency domain. Given that each case consists of two cranio-caudal and two medio-lateral oblique view images of left and right breasts, two feature pools were built, which contain the computed features from either two positive images of one breast or all the four images of two breasts. For each feature pool, a particle swarm optimization (PSO) method was applied to determine the optimal feature cluster followed by training an SVM classifier to generate a final score for predicting likelihood of the case being malignant. The classification performances measured by AUC were 0.79±0.07 and 0.75±0.08 when applying the SVM classifiers trained using image features computed from two-view and four-view images, respectively. This study demonstrates the potential of developing a global mammographic image feature analysis-based scheme to predict case malignancy without including an arduous segmentation of breast lesions. In the third study, given that the performance of DL-based models in the medical imaging field is generally bottlenecked by a lack of sufficient labeled images, we specifically investigate the effectiveness of applying the latest transferring generative adversarial networks (GAN) technology to augment limited data for performance boost in the task of breast mass classification. This transferring GAN model was first pre-trained on a dataset of 25,000 mammogram patches (without labels). Then its generator and the discriminator were fine-tuned on a much smaller dataset containing 1024 labeled breast mass images. A supervised loss was integrated with the discriminator, such that it can be used to directly classify the benign/malignant masses. Our proposed approach improved the classification accuracy by 6.002%, when compared with the classifiers trained without traditional data augmentation. This investigation may provide a new perspective for researchers to effectively train the GAN models on a medical imaging task with only limited datasets. Like the third study, our last study also aims to alleviate DL models’ reliance on large amounts of annotations but uses a totally different approach. We propose employing a semi-supervised method, i.e., virtual adversarial training (VAT), to learn and leverage useful information underlying in unlabeled data for better classification of breast masses. Accordingly, our VAT-based models have two types of losses, namely supervised and virtual adversarial losses. The former loss acts as in supervised classification, while the latter loss works towards enhancing the model’s robustness against virtual adversarial perturbation, thus improving model generalizability. A large CNN and a small CNN were used in this investigation, and both were trained with and without the adversarial loss. When the labeled ratios were 40% and 80%, VAT-based CNNs delivered the highest classification accuracy of 0.740±0.015 and 0.760±0.015, respectively. The experimental results suggest that the VAT-based CAD scheme can effectively utilize meaningful knowledge from unlabeled data to better classify mammographic breast mass images. In summary, several innovative approaches have been investigated and evaluated in this dissertation to develop ML-based and DL-based CAD schemes for the diagnosis of cervical cancer and breast cancer. The promising results demonstrate the potential of these CAD schemes in assisting radiologists to achieve a more accurate interpretation of radiological images.Computer-aided diagnosisMachine learningDeep learningBreast cancerCervical cancerNovel Computer-Aided Diagnosis Schemes for Radiological Image Analysis