Compression techniques for extreme-scale graphs and matrices: sequential and parallel algorithms

Gopal Krishna, Sudhindra

dc.contributor.advisor	Radhakrishnan, Sridhar
dc.contributor.author	Gopal Krishna, Sudhindra
dc.date.accessioned	2023-08-14T18:53:28Z
dc.date.available	2023-08-14T18:53:28Z
dc.date.issued	2023-08-04
dc.identifier.uri	https://hdl.handle.net/11244/338855
dc.description.abstract	A graph G = (V, E) is an ordered tuple where V is a non-empty set of elements called vertices (nodes), and E is a set of an unordered pair of elements called links (edges), and a time-evolving graph is a change in the states of the edges over time. With the growing popularity of social networks and the massive influx of users, it is becoming a challenging task to store the network/graph and process them as fast as possible before the property of the graph changes with the graph evolution. Graphs or networks are a collection of entities (individuals in a social network) and their relationships (friends, followers); ways to represent a graph can help how the information could be extracted. The increase in the number of users increases the relationship the user has, which makes the graphs massive and nearly impossible to store them in friendly structures such as a matrix or an adjacency list. Therefore, an exciting area of research is storing these massive graphs with a smaller memory footprint and processing with very little extra memory. But there is always a trade-off with time and space; to get a small memory footprint, one has to remove the redundancy rigorously, which consumes time. In the same way, when traversing these tight spaces, the time required to query also increases compared to a matrix or an adjacency list. In this dissertation, we provide the encoding technique to store the arrays in the Compressed Sparse Row (CSR) data structure and extend the encoding to store time-evolving graphs in the form of a CSR. We also propose combinations of two structures (CSR + CBT) to store the time-evolving graphs and to improve the time and space trade-off. Encoding also enables one to access a node without decompressing the entire structure, which means that the data structure can be accessed. We then provide four ways to store multi-dimensional data, which represents intricate relations within the social network. Once the data are stored in compressed format, it is important to provide algorithms that support the structures. One such computation which is the basis for any graph algorithm is matrix-multiplication. We now extend our work to perform value-based matrix multiplication on compressed structures. We test our algorithm on extremely large matrices, in the order of 100s of millions with various levels of sparsity. Using matrix-matrix multiplication and keeping the theme of storing the data in small spaces, we propose another way of compression is through the dimensionality reduction, which is referred to as Matrix Factorization. Performing any of these operations on a compressed structure without decompressing would be time consuming. Therefore, in this dissertation, we introduce a parallel technique to construct the graph and also run a list of queries using the querying algorithms, such as fetching neighbor or edge existence in parallel. We also extend our work to propose parallel time-evolving differential compression of CSR using the prefix sum approach.	en_US
dc.language	en_US	en_US
dc.subject	Data Compression	en_US
dc.subject	Tensors	en_US
dc.subject	Time-Evolving Graphs	en_US
dc.subject	Matrix Operations	en_US
dc.subject	Parallel Computation	en_US
dc.title	Compression techniques for extreme-scale graphs and matrices: sequential and parallel algorithms	en_US
dc.contributor.committeeMember	Nicholson, Charles
dc.contributor.committeeMember	Antonio, John
dc.contributor.committeeMember	Pan, Chongle
dc.date.manuscript	2023-07-28
dc.thesis.degree	Ph.D.	en_US
ou.group	Gallogly College of Engineering::School of Computer Science	en_US
shareok.orcid	0000-0002-2433-5855	en_US

Files in this item

Name:: 2023_Gopal Krishna_Sudhindra_D ...
Size:: 1.882Mb
Format:: PDF

View/Open

Name:: 2023_Gopal Krishna_Sudhindra_D ...
Size:: 1.939Mb
Format:: Unknown

View/Open

This item appears in the following Collection(s)

OU - Dissertations [9426]

Show simple item record

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory

Compression techniques for extreme-scale graphs and matrices: sequential and parallel algorithms

Files in this item

This item appears in the following Collection(s)