Show simple item record

dc.contributor.advisorImran, Ali
dc.contributor.authorShodamola, Joel
dc.date.accessioned2021-07-12T19:42:51Z
dc.date.available2021-07-12T19:42:51Z
dc.date.issued2021
dc.identifier.urihttps://hdl.handle.net/11244/330132
dc.description.abstractThe essence and importance of rich and relevant data can not be overemphasized in the field of artificial intelligence. From machine learning to deep learning models, the performance of a model is majorly dependent on the quantity and quality of data fed for training. However, in today’s cellular network, this rich data is not easily accessible and attainable. For example, in Minimization of Drive Test (MDT) reports, manual operators realize only a fraction of the entire networks coverage map and this results into a suboptimal performance of the network ultimately leading to a poor performance in anticipatory network automation. To enable seamless automation, synthetic data is leveraged to enrich and improve the quality as well as quantity of data for robust and intelligent networks. However, to generate synthetic data that has close attribute with the ground is not a trivial task. In this thesis, we present and evaluate a framework to address this challenge. We employ the use of generative models, specifically, Generative Adversarial Networks (GAN) and Variational Autoendocers to augment the sparse multi-dimensional attributes that exist in real datasets. Unlike image data where the quality of synthetic images produced by the generative models can be evaluated visually, establishing the authenticity of tabular synthetic data is a more complex problem. We address this problem by leveraging a tripartite approach: 1) We use several statistical measures to quantify the resemblance of synthetic data with original data. 2) We compare the performance of an ensemble learning model trained on augmented data, with that of trained on original data only 3) We benchmark the performance of the generative models with several classical ML models. This analysis is carried out for varying levels of sparsity and reveals insights about robustness of generative models against training data sparsity as well as on suitability of various methods for evaluating the quality of the generated synthetic tabular data. Results show GAN performs considerably better compared to other approaches. The presented solution thus can be used to overcome the sparsity problem thereby enabling ML-based network automation use cases.en_US
dc.languageen_USen_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectGANen_US
dc.subjectData Augmentationen_US
dc.subjectMachine learningen_US
dc.titleInvestigating the potential of synthetic data for enabling AI-based zero-touch network automationen_US
dc.contributor.committeeMemberHazem, Refai
dc.contributor.committeeMemberCheng, Samuel
dc.date.manuscript2021-07
dc.thesis.degreeMaster of Scienceen_US
ou.groupGallogly College of Engineering::School of Electrical and Computer Engineeringen_US
shareok.nativefileaccessrestricteden_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record


Attribution-NonCommercial-NoDerivatives 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International