Investigating the potential of synthetic data for enabling AI-based zero-touch network automation

Shodamola, Joel

dc.contributor.advisor	Imran, Ali
dc.contributor.author	Shodamola, Joel
dc.date.accessioned	2021-07-12T19:42:51Z
dc.date.available	2021-07-12T19:42:51Z
dc.date.issued	2021
dc.identifier.uri	https://hdl.handle.net/11244/330132
dc.description.abstract	The essence and importance of rich and relevant data can not be overemphasized in the field of artificial intelligence. From machine learning to deep learning models, the performance of a model is majorly dependent on the quantity and quality of data fed for training. However, in today’s cellular network, this rich data is not easily accessible and attainable. For example, in Minimization of Drive Test (MDT) reports, manual operators realize only a fraction of the entire networks coverage map and this results into a suboptimal performance of the network ultimately leading to a poor performance in anticipatory network automation. To enable seamless automation, synthetic data is leveraged to enrich and improve the quality as well as quantity of data for robust and intelligent networks. However, to generate synthetic data that has close attribute with the ground is not a trivial task. In this thesis, we present and evaluate a framework to address this challenge. We employ the use of generative models, specifically, Generative Adversarial Networks (GAN) and Variational Autoendocers to augment the sparse multi-dimensional attributes that exist in real datasets. Unlike image data where the quality of synthetic images produced by the generative models can be evaluated visually, establishing the authenticity of tabular synthetic data is a more complex problem. We address this problem by leveraging a tripartite approach: 1) We use several statistical measures to quantify the resemblance of synthetic data with original data. 2) We compare the performance of an ensemble learning model trained on augmented data, with that of trained on original data only 3) We benchmark the performance of the generative models with several classical ML models. This analysis is carried out for varying levels of sparsity and reveals insights about robustness of generative models against training data sparsity as well as on suitability of various methods for evaluating the quality of the generated synthetic tabular data. Results show GAN performs considerably better compared to other approaches. The presented solution thus can be used to overcome the sparsity problem thereby enabling ML-based network automation use cases.	en_US
dc.language	en_US	en_US
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	GAN	en_US
dc.subject	Data Augmentation	en_US
dc.subject	Machine learning	en_US
dc.title	Investigating the potential of synthetic data for enabling AI-based zero-touch network automation	en_US
dc.contributor.committeeMember	Hazem, Refai
dc.contributor.committeeMember	Cheng, Samuel
dc.date.manuscript	2021-07
dc.thesis.degree	Master of Science	en_US
ou.group	Gallogly College of Engineering::School of Electrical and Computer Engineering	en_US
shareok.nativefileaccess	restricted	en_US