Model re-training for dynamic graphs

Puram, Varun Teja

View/Open

Puram_okstate_0664M_17843.pdf (3.477Mb)

Date

2022-07

Author

Puram, Varun Teja

Metadata

Show full item record

Abstract

In Machine Learning, the most critical assumption is that training and testing datasets should have similar distributions. The model will be effective if the new test data is similar to the past data on which the model was trained. If there are substantial differences between the training data and the testing data, the machine learning algorithm will generate results that are not very accurate. In many applications, the data has dynamic periodicity, that is, the data changes with time. As the distribution of the data keeps changing, at some point, the model will therefore have to be retrained.

In this research I look at the dynamic behavior of graph data. As data changes, there will be addition/deletions of nodes/edges of the graph. As we are dealing with large sets of graph data, we use embedding vector spaces (for graph data) for training and testing. Embedding vector spaces in each timestamp are different and training the model each time when data changes is expensive. To address these challenges, we use the dfs_dynode2vec algorithm where the current timestamp graph embedding vectors initializes from the previous embedding vectors. For each timestamp, data might change significantly or insignificantly. We propose a statistical model ‘Significant testing’ which determines whether the model should be retrained or not. If the change is insignificant, the model need not to be trained again and embedded vectors for that timestamp are not generated. We have considered several aspects in determining the statistical significance of the change. These include edge centrality, betweenness centrality and norm calculations.

URI

https://hdl.handle.net/11244/337374

Collections

OSU Theses [15752]

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory