Statistical Anomaly Discovery Through Visualization

dc.contributor.advisorWeaver, Chris
dc.contributor.authorXu, Chenguang
dc.contributor.committeeMemberGrant, Christan
dc.contributor.committeeMemberMcGovern, Amy
dc.contributor.committeeMemberHougen, Dean
dc.contributor.committeeMemberEndres, William
dc.date.accessioned2023-05-08T14:51:20Z
dc.date.available2023-05-08T14:51:20Z
dc.date.issued2023-05-12
dc.date.manuscript2023-05-05
dc.description.abstractDeveloping a deep understanding of data is a crucial part of decision-making processes. It often takes substantial time and effort to develop a solid understanding to make well-informed decisions. Data analysts often perform statistical analyses through visualization to develop such understanding. However, applicable insight can be difficult due to biases and anomalies in data. An often overlooked phenomenon is mix effects, in which subgroups of data exhibit patterns opposite to the data as a whole. This phenomenon is widespread and often leads inexperienced analysts to draw contradictory conclusions. Discovering such anomalies in data becomes challenging as data continue to grow in volume, dimensionality, and cardinality. Effectively designed data visualizations empower data analysts to reveal and understand patterns in data for studying such paradoxical anomalies. This research explores several approaches for combining statistical analysis and visualization to discover and examine anomalies in multidimensional data. It starts with an automatic anomaly detection method based on correlation comparison and experiments to determine the running time and complexity of the algorithm. Subsequently, the research investigates the design, development, and implementation of a series of visualization techniques to fulfill the needs of analysis through a variety of statistical methods. We create an interactive visual analysis system, Wiggum, for revealing various forms of mix effects. A user study to evaluate Wiggum strengthens understanding of the factors that contribute to the comprehension of statistical concepts. Furthermore, a conceptual model, visual correspondence, is presented to study how users can determine the identity of items between visual representations by interpreting the relationships between their respective visual encodings. It is practical to build visualizations with highly linked views informed by visual correspondence theory. We present a hybrid tree visualization technique, PatternTree, which applies the visual correspondence theory. PatternTree supports users to more readily discover statistical anomalies and explore their relationships. Overall, this dissertation contributes a merging of new visualization theory and designs for analysis of statistical anomalies, thereby leading the way to the creation of effective visualizations for statistical analysis.en_US
dc.identifier.urihttps://shareok.org/handle/11244/337584
dc.languageen_USen_US
dc.subjectmix effectsen_US
dc.subjectSimpson’s paradoxen_US
dc.subjectVisual analyticsen_US
dc.subjectInformation visualizationen_US
dc.thesis.degreePh.D.en_US
dc.titleStatistical Anomaly Discovery Through Visualizationen_US
ou.groupGallogly College of Engineering::School of Computer Scienceen_US
shareok.orcid0000-0002-2305-9924en_US

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
2023_Xu_Chenguang_Dissertation.pdf
Size:
10.2 MB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
2023_Xu_Chenguang_Dissertation.zip
Size:
12.19 MB
Format:
Unknown data format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: