Loading...
Thumbnail Image

Date

2023-08-04

Journal Title

Journal ISSN

Volume Title

Publisher

A graph G = (V, E) is an ordered tuple where V is a non-empty set of elements called vertices (nodes), and E is a set of an unordered pair of elements called links (edges), and a time-evolving graph is a change in the states of the edges over time. With the growing popularity of social networks and the massive influx of users, it is becoming a challenging task to store the network/graph and process them as fast as possible before the property of the graph changes with the graph evolution.

Graphs or networks are a collection of entities (individuals in a social network) and their relationships (friends, followers); ways to represent a graph can help how the information could be extracted. The increase in the number of users increases the relationship the user has, which makes the graphs massive and nearly impossible to store them in friendly structures such as a matrix or an adjacency list. Therefore, an exciting area of research is storing these massive graphs with a smaller memory footprint and processing with very little extra memory.

But there is always a trade-off with time and space; to get a small memory footprint, one has to remove the redundancy rigorously, which consumes time. In the same way, when traversing these tight spaces, the time required to query also increases compared to a matrix or an adjacency list.

In this dissertation, we provide the encoding technique to store the arrays in the Compressed Sparse Row (CSR) data structure and extend the encoding to store time-evolving graphs in the form of a CSR. We also propose combinations of two structures (CSR + CBT) to store the time-evolving graphs and to improve the time and space trade-off. Encoding also enables one to access a node without decompressing the entire structure, which means that the data structure can be accessed.

We then provide four ways to store multi-dimensional data, which represents intricate relations within the social network. Once the data are stored in compressed format, it is important to provide algorithms that support the structures. One such computation which is the basis for any graph algorithm is matrix-multiplication. We now extend our work to perform value-based matrix multiplication on compressed structures. We test our algorithm on extremely large matrices, in the order of 100s of millions with various levels of sparsity. Using matrix-matrix multiplication and keeping the theme of storing the data in small spaces, we propose another way of compression is through the dimensionality reduction, which is referred to as Matrix Factorization.

Performing any of these operations on a compressed structure without decompressing would be time consuming. Therefore, in this dissertation, we introduce a parallel technique to construct the graph and also run a list of queries using the querying algorithms, such as fetching neighbor or edge existence in parallel. We also extend our work to propose parallel time-evolving differential compression of CSR using the prefix sum approach.

Description

Keywords

Data Compression, Tensors, Time-Evolving Graphs, Matrix Operations, Parallel Computation

Citation

DOI

Related file

Notes

Sponsorship