Statistical Anomaly Discovery Through Visualization
Abstract
Developing a deep understanding of data is a crucial part of decision-making processes.
It often takes substantial time and effort to develop a solid understanding to make well-informed
decisions. Data analysts often perform statistical analyses through visualization
to develop such understanding. However, applicable insight can be difficult due to biases
and anomalies in data. An often overlooked phenomenon is mix effects, in which subgroups
of data exhibit patterns opposite to the data as a whole. This phenomenon is widespread
and often leads inexperienced analysts to draw contradictory conclusions. Discovering such
anomalies in data becomes challenging as data continue to grow in volume, dimensionality,
and cardinality. Effectively designed data visualizations empower data analysts to reveal
and understand patterns in data for studying such paradoxical anomalies.
This research explores several approaches for combining statistical analysis and visualization
to discover and examine anomalies in multidimensional data. It starts with an automatic
anomaly detection method based on correlation comparison and experiments to determine
the running time and complexity of the algorithm. Subsequently, the research investigates
the design, development, and implementation of a series of visualization techniques to fulfill
the needs of analysis through a variety of statistical methods. We create an interactive visual
analysis system, Wiggum, for revealing various forms of mix effects. A user study to evaluate
Wiggum strengthens understanding of the factors that contribute to the comprehension of
statistical concepts. Furthermore, a conceptual model, visual correspondence, is presented
to study how users can determine the identity of items between visual representations by
interpreting the relationships between their respective visual encodings. It is practical to
build visualizations with highly linked views informed by visual correspondence theory. We
present a hybrid tree visualization technique, PatternTree, which applies the visual
correspondence theory. PatternTree supports users to more readily discover statistical anomalies
and explore their relationships. Overall, this dissertation contributes a merging of new visualization
theory and designs for analysis of statistical anomalies, thereby leading the way to
the creation of effective visualizations for statistical analysis.
Collections
- OU - Dissertations [9477]