Bridging the Gap between Sparse Matrix Computation and Graph Models

dc.contributor.advisorVeras, Richard
dc.contributor.authorAbdelaal, Khaled
dc.contributor.committeeMemberAntonio, John
dc.contributor.committeeMemberCheng, Qi
dc.contributor.committeeMemberJo, Javier
dc.date.accessioned2024-04-25T15:29:17Z
dc.date.available2024-04-25T15:29:17Z
dc.date.issued2024-05-10
dc.date.manuscript2024-04-18
dc.description.abstractSparsity manifests itself in a multitude of modern High-Performance Computing applications including graph neural networks, network analytics, and scientific computing. In sparse matrices, the majority of values are zeros. Traditional methods of storing and processing dense data are unsuitable for the new nature of sparse data, as they end up wasting storage and compute on zeros. Hence, a variety of sparse data formats that store only the non-zero elements were proposed in literature to provide a compact representation of sparse data. Performance of operations on sparse data mainly depends on the sparse data format used for storing the data, as the algorithm needs to closely match the sparse data format. However, choosing the optimal sparse data format for the input sparse matrix is non-trivial, as the optimal format depends on the sparsity pattern of the input sparse matrix. For example, in sparse matrix-vector multiplication (SpMV), for the same input sparse matrix, using different sparse data formats can yield highly variant performance. The best format being the one that closely matches how the non-zeros are arranged within the matrix. Additionally, performance prediction for operations involving sparse matrices is not as straightforward as it used to be for the dense case. For dense computations, dimensions and strides suffice for performance predictions as they provide a sense of the number of floating point operations (FLOPs) to be performed, and how this number compares to the architecture properties (peak FLOPs, number of processing elements, etc.). On the other hand, sparse matrix dimensions do not directly convey useful information about the total number of operations to be performed, since the majority of elements are zeros and do not contribute to the total number of FLOPs. Moreover, existing work on sparse operations optimizations mainly depends on a discrete set of matrices, limiting the ability to generalize observations. To address these challenges, we identify the sparsity pattern as the main driving factor for performance. First, we propose an extensible classifier framework to automatically identify the sparsity pattern of the input sparse matrix. This framework uses graph neural networks (GNNs) by representing the input sparse matrix as a graph, and then learning the structural relationship between nodes in the matrix (graph). Our framework achieves up to 98% classification accuracy on full graphs, the same accuracy for scrambled matrices, and 92% accuracy for small random subsamples taken out of original graphs. Second, we use graph models as a proxy to generate large-scale synthetic sparse matrices. We propose another modular framework to study the correlation between the graph model parameters, and the structure of the resulting graph and its tolerance to noise during the generation step. Third, we also use graph models for a performance evaluation framework that can assist in finding the best sparse format for a given graph model on a given architecture, utilizing the graph model parameters as a representative set of features to predict performance, and providing a more robust way of visualizing sparse matrix operations performance. This framework also takes into consideration noise in the matrix generation step, and evaluates the extent of noise to which performance can still be predictable based on the graph model parameters. Our results show that sparse computations need richer models to categorize sparse matrices in terms of structure, study the sensitivity of such models even for one structure, and tie performance to a more descriptive set of parameters.en_US
dc.identifier.urihttps://hdl.handle.net/11244/340238
dc.languageen_USen_US
dc.subjectComputer Scienceen_US
dc.subjectHigh-Performance Computingen_US
dc.subjectSparse Linear Algebraen_US
dc.subjectPerformance Analysis and Optimizationen_US
dc.subjectGraph Workloadsen_US
dc.thesis.degreePh.D.en_US
dc.titleBridging the Gap between Sparse Matrix Computation and Graph Modelsen_US
ou.groupGallogly College of Engineering::School of Computer Scienceen_US
shareok.orcid0000-0001-8963-7523en_US

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
2024_Abdelaal_Khaled Mohammed Attia Elsayed_Dissertation.pdf
Size:
16.99 MB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
2024_Abdelaal_Khaled Mohammed Attia Elsayed_Dissertation.zip
Size:
16.91 MB
Format:
Unknown data format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: