Impact of Temporal Order Selection on Clustering Intensive Longitudinal Data Based on VAR Models

Li, Yaqi

Date

2023-08

In real-world research, intensive longitudinal data (ILDs) are typically collected from a group of individuals of interest, which enables researchers to model not only the within-individual dynamics of the studied processes but also the between-individual differences on the within-individual dynamics. Among the statistical techniques proposed for modeling ILDs of multiple individuals, clustering of intensive longitudinal data provides a meaningful way to quantify sample heterogeneity in dynamic processes, assuming that such heterogeneity reflects the distinct nature of the studied processes. The aims of this dissertation are threefold: (a) to introduce a VAR-based clustering technique, (b) to examine the impact of temporal order selection on clustering accuracy and parameter estimation by a simulation study, and (c) to demonstrate the application of the clustering technique through an empirical analysis. Specially, I investigated the influence of two temporal order selection strategies: (1) using the most complex structure or highest order (HO) for all individual processes, and (2) using the most parsimonious structure or the lowest order (LO) for all individuals on the performance of two-step model-based clustering procedure. This procedure extracted dynamic coefficients from vector autoregressive (VAR) models and employed the Gaussian mixture model (GMM) and K-means clustering algorithms on the coefficients for cluster identification. Additionally, I also examined whether the influence varied across two clustering algorithms. The simulation study showed that, regardless of the clustering algorithms used, LO strategy consistently outperformed HO strategy in terms of recovering the number of clusters, cluster membership, and cluster-specific AR and CR effects. GMM performed better than K-means when LO strategy was applied; however, the performance of GMM decreased while the temporal orders increased. Additionally, GMM showed more vulnerability with smaller numbers of participants. The application of the two-step VAR-based method to affect data yielded a meaningful and informative clustering solution, which provided further insights of the uses of the model-based clustering approach Lastly, suggestions and recommendations were offered based on the results of the simulation and empirical analyses.

Keywords

Vector autoregressive model, Temporal order, Intensive longitudinal data, Clustering

URI

https://shareok.org/handle/11244/337940

Collections

OU - Dissertations

Full item page

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Description

Keywords

Citation

URI

DOI

Related file

Notes

Sponsorship

Collections