dc.contributor.advisor | Yen, Gary G. | |
dc.contributor.author | Wu, Zheng | |
dc.date.accessioned | 2013-12-10T18:05:53Z | |
dc.date.available | 2013-12-10T18:05:53Z | |
dc.date.issued | 2006-07 | |
dc.identifier.uri | https://hdl.handle.net/11244/7879 | |
dc.description.abstract | The Self-Organizing Map (SOM) is an unsupervised neural network model that provides topology-preserving mapping from high-dimensional input spaces onto a commonly two-dimensional output space. In this study, the clustering and visualization capabilities of the SOM, especially in the analysis of textual data, i.e. document collections, are reviewed and further developed. A novel clustering and visualization approach based on the SOM is proposed for the task of text data mining. The proposed approach first transforms the document space into a multi-dimensional vector space by means of document encoding. Then a growing hierarchical SOM (GHSOM) is trained and used as a baseline framework, which automatically produces maps with various levels of details. Following the training of the GHSOM, a novel projection method, namely the Ranked Centroid Projection (RCP), is applied to project the input vectors onto a hierarchy of two-dimensional output maps. The projection of the input vectors is treated as a vector interpolation into a two-dimensional regular map grid. A ranking scheme is introduced to select the nearest R units around the input vector in the original data space, the positions of which will be taken into account in computing the projection coordinates. | |
dc.description.abstract | The proposed approach can be used both as a data analysis tool and as a direct interface to the data. Its applicability has been demonstrated in this study using an illustrative data set and two real-world document clustering tasks, i.e. the SOM paper collection and the Anthrax paper collection. Based on the proposed approach, a software toolbox is designed for analyzing and visualizing document collections, which provides a user-friendly interface and several exploration and analysis functions. | |
dc.description.abstract | The presented SOM-based approach incorporates several unique features, such as the adaptive structure, the hierarchical training, the automatic parameter adjustment and the incremental clustering. Its advantages include the ability to convey a large amount of information in a limited space with comparatively low computation load, the potential to reveal conceptual relationships among documents, and the facilitation of perceptual inferences on both inter-cluster and within-cluster relationships. | |
dc.format | application/pdf | |
dc.language | en_US | |
dc.rights | Copyright is held by the author who has granted the Oklahoma State University Library the non-exclusive right to share this material in its institutional repository. Contact Digital Library Services at lib-dls@okstate.edu or 405-744-9161 for the permission policy on the use, reproduction or distribution of this material. | |
dc.title | Ranked centroid projection: A data visualization approach based on self-organizing maps | |
dc.contributor.committeeMember | Teague, Keith | |
dc.contributor.committeeMember | Yarlagadda, Radha K. Rao | |
dc.contributor.committeeMember | DeYong, Camille F. | |
osu.filename | Wu_okstate_0664D_1941.pdf | |
osu.accesstype | Open Access | |
dc.type.genre | Dissertation | |
dc.type.material | Text | |
thesis.degree.discipline | Electrical and Computer Engineering | |
thesis.degree.grantor | Oklahoma State University | |