Date
Journal Title
Journal ISSN
Volume Title
Publisher
This thesis presents a “Novel Small Cell Planning Solution using Machine learning”. The Telecom service providers are interested in estimating various trends in order to plan future upgrades and deployments driven by real data. Fundamentally, the service provider landscape is changing. The numbers of devices are increasing in the network such as small cells to cater the growing demands. Also, the increasing amount of data has caused a big data revolution that is having an impact on telecom. With the advance big data analytics solutions and with fine grained analytics in real time, needs in bandwidth change from one place to another throughout the day, week, month, etc, becomes predictable. Hence, big data analytics solutions can help in deciding footprint of small cells and efficiently deployment of small cells. In this thesis, I have used the open big data that is published at the site: https://dandelion.eu/datamine/open-big-data/ under Open Data Commons Open Database License (ODbL) license. This dataset provides information about the telecommunication activities over the city of Milano. The dataset is the result of a computation over the Call Detail Records (CDRs) generated by the Telecom Italia cellular network over the city of Milano. Data mining is the technique to find concealed and fascinating pattern from dataset, which can be used in decision making and future prediction. In this thesis, data preprocessing has been performed on hadoop framework using hive with Cloudera's open source platform, CDH cloudera-quickstart-vm-5.3.0-0-vmware. In this thesis, the (Eps, MinPts) DBSCAN density based spatial clustering algorithm is used clustering the geospatial data. DBSCAN clusters a spatial data set based on two parameters namely physical distance from each point and a minimum cluster size. This method is best fit for spatial latitude-longitude data. In this thesis, the scikit-leran machine learning platform is used to implement the solution, scikit-learn in python is one of the widely used machine learning platform, it provides a wide range of supervised and unsupervised learning algorithms via a consistent interface in Python. For the validation of the clustering results, the data mining tool WEKA 3.6.11 is used. For benchmarking of the proposed solution, the DBSCAN algorithms clustering result is compared with the WEKA cluster’s results. The final results show that the solution produces very promising results. The three promising results are , it is able to reveal all the objects from the datasets on the basis of user defined algorithm input parameters. The input parameters have a decisive impact on the cluster result. It can extract spatial, temporal and semantically separated clusters. The detected clusters are visualized using Matplotlib plotting library for the Python, WEKA and geojson.io online tool.