Show simple item record

dc.contributor.advisorGruenwald, Le
dc.creatorSadik, Shiblee
dc.date.accessioned2019-06-03T20:35:27Z
dc.date.available2019-06-03T20:35:27Z
dc.date.issued2013
dc.identifier99174134802042
dc.identifier.urihttps://hdl.handle.net/11244/320199
dc.description.abstractIn applications, such as Web clicks and environmental monitoring, data are in the form of a stream, each of which is an infinite sequence of data points with explicit or implicit timestamps and has special characteristics, such as transiency, uncertainty, dynamic data distribution, multi-dimensionality, asynchronous data arrival, dynamic relationships, and schema heterogeneity of data from different sources. In those applications, outliers do exist due to many reasons including human error, instrument error, catastrophe, and malicious behavior. Being able to detect outliers effectively is critical to many data management and mining tasks. However, not much research has been conducted to discover outliers in data stream applications, especially for those involving multi-dimensionality, related, heterogeneous, and asynchronous streams.
dc.description.abstractIn this dissertation, two innovative outlier detection algorithms, Orion and Wadjet, which take all the data streams' characteristics into consideration are presented. Orion is designed for applications where data are from single stream. It looks for a projected dimension that reveals the outlier nature of multi-dimensional data points with the help of an evolutionary algorithm, and identifies a data point as an outlier if it resides in a low density region in that dimension. Wadjet is designed for applications where data are from multiple, heterogeneous, and asynchronous streams. It has two phases: in the first phase, it processes each stream independently like Orion, and in the second phase, it captures and continuously evaluates the cross-correlation, if any, among the data points from multiple streams, and identifies a data point as an outlier if its value does not conform to the captured cross-correlation.
dc.description.abstractExtensive theoretical and empirical analyses have been conducted to evaluate the performance of Orion and Wadjet using real and synthetic datasets. The evaluation results show that both algorithms have better accuracy and execution time than the state-of-art techniques when applied to homogeneous data stream applications. The results also show that Wadjet is effective in detecting outliers in heterogeneous data streams which cannot be handled by existing algorithms.
dc.format.extent289 pages
dc.format.mediumapplication.pdf
dc.languageen_US
dc.relation.requiresAdobe Acrobat Reader
dc.subjectOutliers (Statistics)
dc.subjectAlgorithms
dc.subjectData mining
dc.titleOnline Detection of Outliers for Data Streams
dc.typetext
dc.typedocument
dc.thesis.degreePh.D.
ou.groupCollege of Engineering::School of Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record