Document Fingerprinting Using Graph Grammar Induction
MetadataShow full item record
The purpose of this study was to detect the similarity between documents when the relationships between textures are considered. In our study, we focus on C-language documents as our domain. Our algorithm starts from converting document into graph format. Next, graph grammar is extracted from the graph by SubdueGL, a graph grammar induction algorithm. Finally, the evaluation of the similarity between documents is accomplished by comparing the graph grammars. We also study graph characteristics, graph grammar and the graph isomorphism. In the converting module, documents are translated into graph format, which can be defined differently in various domains. In C-language documents, we found that a conceptual graph which is the most expressive via considering in relationship between textures has the best performance in detecting similarity. Thus, our algorithm generates this conceptual graph. After evaluating our algorithm, the results show that our algorithm can detect the similarity between documents well. However, it can not indicate that the found similarity is texture similarity or structure similarity because our process combines those two similarities in its final result. Nevertheless, compared to other algorithms, our approach works well when relationships between textures are considered.
- OSU Theses