Loading...
Thumbnail Image

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Creative Commons
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International

The essence and importance of rich and relevant data can not be overemphasized in the field of artificial intelligence. From machine learning to deep learning models, the performance of a model is majorly dependent on the quantity and quality of data fed for training. However, in today’s cellular network, this rich data is not easily accessible and attainable. For example, in Minimization of Drive Test (MDT) reports, manual operators realize only a fraction of the entire networks coverage map and this results into a suboptimal performance of the network ultimately leading to a poor performance in anticipatory network automation. To enable seamless automation, synthetic data is leveraged to enrich and improve the quality as well as quantity of data for robust and intelligent networks. However, to generate synthetic data that has close attribute with the ground is not a trivial task. In this thesis, we present and evaluate a framework to address this challenge. We employ the use of generative models, specifically, Generative Adversarial Networks (GAN) and Variational Autoendocers to augment the sparse multi-dimensional attributes that exist in real datasets. Unlike image data where the quality of synthetic images produced by the generative models can be evaluated visually, establishing the authenticity of tabular synthetic data is a more complex problem.

We address this problem by leveraging a tripartite approach: 1) We use several statistical measures to quantify the resemblance of synthetic data with original data. 2) We compare the performance of an ensemble learning model trained on augmented data, with that of trained on original data only 3) We benchmark the performance of the generative models with several classical ML models. This analysis is carried out for varying levels of sparsity and reveals insights about robustness of generative models against training data sparsity as well as on suitability of various methods for evaluating the quality of the generated synthetic tabular data. Results show GAN performs considerably better compared to other approaches. The presented solution thus can be used to overcome the sparsity problem thereby enabling ML-based network automation use cases.

Description

Keywords

GAN, Data Augmentation, Machine learning

Citation

DOI

Related file

Notes

Sponsorship

Collections