Alternative Multiple Imputation Inference for Mean and Covariance Structure Modeling

Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure modeling. Extending the logic developed by Yuan and Bentler, Cai, and Cai and Lee, we propose an alternative method for conducting multiple imputation–based inference for mean and covariance structure modeling. In addition to computational simplicity, our method naturally leads to an asymptotically chi-square model fit test statistic. Using simulations, we show that our new method is well calibrated, and we illustrate it with analyses of three real data sets. A SAS macro implementing this method is also provided.


Introduction
Model-based multiple imputation is widely accepted as one of the most flexible methods for handling missing data in a variety of applied research settings (Allison, 2001;Little & Rubin, 1987;Schafer, 1997). The formulas that combine point estimates and standard errors across multiple imputations have become familiar sights to educational and behavior scientists. The concept of replacing unobserved values by random draws from their proper posterior predictive distributions has not only fundamentally changed how methodologists and statisticians deal with missing observations in data collection and study design but also exerted far-reaching influence on statistical modeling and computation involving latent variables. After all, latent variables and missing data are synonymous.
As an inferential framework, multiple model-based imputation is extremely general. This generality stems from the usefulness of a missing data formulation popularized by the seminal paper on expectation-maximization (EM) algorithm by Dempster, Laird, and Rubin (1977). Unlike full likelihood-based methods that often require significant amount of tailored development for each new specialized condition, with the availability of proper multiple imputations, the complete data modeling and estimation problems are often remarkably simple. This is most naturally illustrated by the multiple imputation methods for the treatment of survey nonresponse, but it is one of a number of contexts in which imputation is used as a critical device to obtain statistical adjustments that would otherwise be too complicated for routine application. We mention below two seemingly unrelated examples simply to highlight the fact that even though a bulk of our subsequent theoretical and empirical investigations will focus on missing observations in mean and covariance structure modeling, the approach we took can be equally palatable to other contexts in which imputations are required, even if no observation is apparently missing.
First, consider the plausible value methodology (see e.g., Mislevy, Beaton, Kaplan, & Sheehan, 1992), which is one of the backbones of the statistical framework employed in such large-scale educational assessment systems as the National Assessment of Educational Progress (NAEP) or Program for International Student Assessment (PISA). These assessments tend to focus on grouplevel (e.g., gender, ethnicity, country, etc.) inferences, but the individual students that make up the groups are usually administered few test items due to the complex study designs that are geared toward obtaining more efficient group-level inferences. This results in substantial uncertainty in the individual students' test scores. The simple aggregation of student scores yields statistically inconsistent group-level estimates. Glossing over a number of important details, at the end, the plausible values are multiple imputations based on a regression model that properly reflects the uncertainty in individual students' scores so that consistent group-level inferences can be drawn. Proper analysis of data sets containing plausible values requires the same tools as in multiple imputation for handling missing observations. Next, consider Stuart and Rubin's (2008) matching method for estimating causal effects from observational data. In their approach, they proposed that one may construct a matched control group from multiple sources of control units when the original control group does not provide enough overlap with the treated group on observed covariates. Having more than one source of control units may introduce bias in estimated average treatment effect. To combat this potential bias, Stuart and Rubin relied on a regression-based multiple imputation procedure. Again, the analysis of the multiple sets of data requires the same basic statistical tools for combining multiple imputations.
Rather independently of the multiple imputation literature (though exceptions do exist, e.g., Rubin & Thayer, 1982), mean and covariance structure modeling (Browne & Arminger, 1995;Yuan & Bentler, 2007) has become one of the most widely used statistical techniques in social and behavioral sciences. However, when mean and covariance structure models must be fitted to multiply imputed data sets, a number of difficulties arise. First, the standard multiple imputation Alternative Multiple Imputation Inference inferential procedure of analyzing each imputation data set separately and combining the point estimates and standard errors at the very end is cumbersome at best. In practice, the researcher usually must fit a series of models to explore specification, to compare their fit, and to examine their relative substantive interpretability. In each step, a variety of statistical and heuristic indices are examined to guide the next move. If there are 20 imputation data sets, the researcher must carry 20 replications in each step of the search for model specification, a daunting task especially when some of the indices consist of entire matrices of numbers (e.g., residual correlations). Automated procedures (e.g., PROC MIANALYZE in SAS) for combining multiple imputation results may only alleviate some of the burden. Second, even setting the cumbersomeness aside, the standard procedure does not provide an overall model fit statistic, which forms the basis of model fit assessment in mean and covariance structure modeling. Current suggestions (e.g., Allison, 2001;Meng & Rubin, 1992) are either not accurate enough (see e.g., Allison, 2003) or require computations that are cumbersome and nonstandard, at least insofar as mean and covariance structure modeling is concerned. Moreover, the performances of those suggestions have not been evaluated in the context of mean and covariance structure modeling. Indeed, we view, as a serious drawback of the standard multiple imputation inferential procedure, the lack of a principled way for computing popular fit indices such as root mean square error of approximation (RMSEA; Browne & Cudeck, 1993), or incremental indices such as Tucker-Lewis index (TLI; Tucker & Lewis, 1973). Third, the current multiple imputation procedures are focused on obtaining corrected final parameter estimates and standard errors and do not encourage publishing enough intermediate results (e.g., mean vector and covariance matrix), so that the other researchers can replicate or meta-analyze the findings more easily. Cai and Lee (2009) also make this point.
In response to the technical difficulties, we propose a new two-stage procedure for conducting multiple imputation inference for mean and covariance structure modeling. We note that this research is not on how to build proper imputation models to produce multiple imputations. In the subsequent theoretical derivations, we also purposively make no distinction between the kinds of imputations involved, for example, missing observations, plausible values, adjustments to potential outcomes, and so on. We will treat the imputations as given and focus on statistical inference with the multiple imputations.
The guiding insight of our new approach is that at least for standard mean and covariance structure modeling, the combination of multiple imputations can occur in the beginning, before any structural models are even fitted. The underlying statistical theory is a direct extension of the results obtained by Yuan and Bentler (2000), Cai (2008), and Cai and Lee (2009) for the deterministic EM algorithm. The new procedure is computationally simple because once the imputations are combined, the structural equation modeler is back in the familiar territory of working directly with summary statistics such as the means and Lee and Cai covariances. We develop an asymptotically chi-square distributed overall model fit statistic based on Browne's (1984) residual-based test (Proposition 4) that can also be used naturally as a basis for additional fit indices.
Sometimes the use of multiple imputations is unavoidable (e.g., dealing with plausible values), but even when there is a choice, we argue that the flexibility afforded by multiple imputation can be decidedly advantageous. Full-information maximum likelihood (FIML; Anderson, 1957;Arbuckle, 1996) is an often recommended alternative estimation strategy for mean and covariance structure modeling when some observations are missing. Under Rubin's (1976) classification system of the types of missing data, FIML enjoys desirable large sample optimal properties when data are missing at random and the missing data mechanism is ignorable. Despite the asymptotic optimality, we show that our new procedure performs practically as well as the asymptotically optimal FIML estimator in finite sample sizes. In contrast to FIML, one of the key benefits of multiple imputation lies in the relative ease of including into the imputation model variables that are not part of the structural model, but are related to the missing data mechanism. Therefore, when data are not missing at random and the missing data mechanism is non-ignorable, which is arguably more realistic, our new procedure can easily reap the benefit of alternative multiple imputation systems that directly model the missing data mechanism without requiring significant change to how mean and covariance structure analysis is conducted in practice.
The remainder of the article is organized as follows. We will begin with some technical development to clarify the details of the new estimator and the assumptions that we are making. We will then use simulation studies to illustrate the performance of the new procedure relative to the FIML estimator and the standard multiple imputation inferential procedure in the context of missing responses. We apply the new procedure to the analysis of several well-known real data sets. We will conclude by noting the limitations of our new approach.

A Mean and Covariance Structure Model
Throughout this article, we will assume that we are working with a multivariate normal data matrix Y with N independent rows and n variables. If the ith row of Y is y 0 i , then y i follows a n-dimensional multivariate normal distribution with mean μ and covariance matrix ⌺. Let ω ¼ [μ 0 ,vech(⌺) 0 ] 0 , where the operator vech(Á) stacks the nðn þ 1Þ=2 unique elements of a symmetric matrix. Clearly, the dimension of ω is d ¼ n þ nðn þ 1Þ=2. For these moments, consider a structural model: Alternative Multiple Imputation Inference where θ 2 Θ is a q-dimensional vector of free parameters in a subset of R q . A typical example of a mean and covariance structure model is the extended factor analytic simultaneous equation model employed in the LISREL framework (Jöreskog & Sörbom, 2001). In LISREL, one may represent the measurement model for the ith case as where η i is a p Â 1 vector of latent common factors that are orthogonal to the n unique factors in ε i . Let the unique factors have zero means and covariance matrix ⌽ θ . The parameter matrices τ θ and ⌳ θ contain the measurement intercepts and factor loadings, respectively. We use the subscript θ to explicitly denote the dependence of the parameter matrices on the vector of free parameters. The structural equations are defined as where ζ i is a p Â 1 vector of equation disturbance terms that have zero means and covariance matrix ⌿ θ , and is orthogonal to ε i by implication. A rearrangement of Equation 3 leads to One has to assume that the difference between the identity matrix and the regression coefficient matrix B is invertible. When substituted into Equation 2, the reduced form equation is Taking expectations, we see that the model implies a linear mean structure The implied covariance structure model is The joint mean and covariance structure model is obviously The exact form of the structural equation model is not essential here. Equation 7 is merely one of the many instantiations of the general mean and covariance structure model given in Equation 1. Other formulations (e.g., the Bentler-Weeks model) can be used. For our purposes, the model in Equation 7 is sufficient.

Estimation and Inference Under Standard Conditions
If there are no missing data in Y or if variables in Y do not involve complexities such as more than one set of imputations, we can use standard software programs Lee and Cai such as PROC CALIS in SAS to estimate θ from the sample summary statistics. Let the sample estimates of the mean vector and the covariance matrix bê respectively. We defineω ¼μ 0 ; vech⌺ 0 h i 0 .
In practice, estimation is typically accomplished by invoking the multivariate normality assumption of y i and minimizing the following maximum likelihood discrepancy function (see e.g., Yuan & Bentler, 2007): Fðθ;ωÞ ¼ trf⌺½⌺ðθÞ À1 g À log j⌺½⌺ðθÞ À1 j À n þ ½μ À μðθÞ 0 ½⌺ðθÞ À1 ½μ À μðθÞ: The minimizer of Fðθ;ωÞ is the maximum likelihood estimator of the structural parameters, and we denote it aŝ Under appropriate conditions, it is consistent, asymptotically normal, and asymptotically efficient. Furthermore, the statistic is distributed in large samples as a central chi-square variable with d À q degrees of freedom, if the model is specified correctly. The second derivative matrix of the discrepancy function can be used to construct a large sample covariance matrix of the estimates.

Conventional Multiple Imputation Estimation and Inference
If missing values are present, or if some or all variables in Y have more than one copy of imputations, estimation of θ can be carried out in the following manner. First, either the researcher would produce (M > 1) multiply imputed data sets, denoted as Y 1 ; . . . ; Y M , or the M data sets have already been preimputed, for example, in the case of plausible values in surveys. The mean and covariance structure model is then fitted to each imputation. The point estimates and standard errors are aggregated at the end.
In brief, consider the mth imputation (or plausible value) Y m , m ¼ 1; . . . ; M. Based on this complete data set, we can easily compute the sample mean vector and sample covariance matrix using Equations 8 and 9. Denote the sample mean Alternative Multiple Imputation Inference vector and covariance matrix based on the mth imputation asμ m and⌺ m , respectively. Then for each imputation, we can define a maximum likelihood discrepancy function for our mean and covariance structure model Fðθ;ω m Þ, wherê ω m ¼ ½μ 0 m 0 ; vechð⌺ m Þ 0 0 . Minimization of Fðθ;ω m Þ leads to the parameter estimate based on the mth imputation. Let us denote it asθ m . At the same time, we also obtain a covariance matrix ofθ m as a by-product. Let us denote it as U m ¼ varðθ m Þ.
Standard multiple imputation formulas can be used to combine the point estimates and variability information (Schafer, 1997). The final point estimate is the average of the multiple imputation estimates: " θ ¼ M À1 P M m¼1θ m , and the multiple imputation variance approximation is ð " m¼1 U m is the average of the within-imputation covariance matrices and V ¼ ðM À 1Þ À1 P M m¼1 ðθ m À θÞðθ m À θÞ 0 is the between-imputation component. This provides the basis for conducting Wald tests involving the structural parameters in θ as well as construction of confidence intervals.
For combining likelihood ratio goodness-of-fit test statistics, we may use the two statistics D 2 and D 3 proposed by Li, Meng, Raghunathan, and Rubin (1991) and Meng and Rubin (1992), respectively. In essence, D 2 is based on M À1 P M m¼1 Fðθ m ;ω m Þ, the average of the minimum fit-function value in each of the M imputed data sets, while D 3 requires additionally M À1 P M m¼1 Fð " θ;ω m Þ, the average of the fit-function values evaluated at " θ, and Approximations based on the F-distribution are constructed. The approximate degrees of freedom can vary as a function of the number of imputations, the model's degrees of freedom, and the fraction of missing information.

Conventional FIML Estimator
As mentioned earlier, full-information maximum likelihood (Arbuckle, 1996) is a popular estimation method for mean and covariance structure models under ignorable missing data. In normal theory FIML estimation, an individual case's contribution to the log likelihood can be defined as where D i is a selection matrix that depends on the missing data pattern in y i . In general, D i consists only of zeros and ones. It has n columns, and as many rows as there are observed variables y i . It can be obtained by removing rows of an n Â n identity matrix when the corresponding rows in y i are missing. For instance, consider three observed variables, y i1 , y i2 , and y i3 . If y i2 is missing, then the selection matrix is Lee and Cai This is obtained by removing the second row of a 3 Â 3 identity matrix. Premultiplication of y i by D i results in the subset of y i that is observed, and D i μðθÞ effectively selects the part of the mean vector for which observation i can contribute information. Similarly, D i ⌺ðθÞD 0 i is the subset of ⌺(θ) that corresponds to the observed portion of y i . When y i contains no missing values, D i is an identity matrix and Equation 13 becomes the usual multivariate normal log likelihood function. The estimator is called full-information because it utilizes all available observations in the data matrix Y. In the next step, the individual log likelihoods are summed up to obtain the log likelihood of the full sample The FIML estimator directly maximizes the log-likelihood function in Equation  14 to obtain the structural model parameter estimates. Let this maximum be denoted θ. Minus one times the inverse of the second derivative matrix of log LðθjYÞ, when evaluated at θ provides a large sample covariance matrix of the parameter estimates. FIML estimation is implemented in major structural equation modeling software packages. The FIML estimator also naturally leads to a chi-square distributed fit statistic based on likelihood ratio comparisons. The above FIML log likelihood can be used to obtain an unstructured/saturated maximum likelihood estimate of the mean vector and covariance matrix under missing data. Let the saturated estimate of the mean vector be denoted μ and the saturated covariance matrix be ⌺ . The following statistic is asymptotically distributed as a central chi-square variable with d À q degrees of freedom when the structural model is correctly specified.

The New Two-Stage Estimator
The conventional multiple imputation procedure outlined previously is ideal for such modeling frameworks as linear regression analysis where the focus is on estimating and testing parameters. For reasons discussed earlier in the introduction section, it can be ill-suited to the practice of mean and covariance structure modeling. We shall call our new estimator the Multiple Imputation Two Stage (MI2S) estimator. As opposed to the standard practice of conducting separate structural analysis before combining the inferences, in MI2S, we combine the multiple imputations first and then estimate the structural parameters. By Alternative Multiple Imputation Inference adapting a theorem (Proposition 4) proposed by Browne (1984), an asymptotically chi-square goodness-of-fit test statistic can be constructed.

Stage One: Combining Multiple Imputations
We assume the availability of M properly imputed data sets, Y 1 ; . . . ; Y M . By proper, we mean that the imputations are based on a well-constructed imputation model that includes key variables properly accounting for the missing data mechanism. In addition, if the Markov chain Monte Carlo (MCMC) method is required to produce the imputations, we assume that necessary numerical precautions have been exercised so that the imputations are drawn from an MCMC sampler after a sufficiently long burn-in period and that appropriate subsampling is taken so that the imputations are approximately uncorrelated. In other contexts, for example, plausible values, the imputations are already given, and we assume that they are proper. For the MI2S estimator in this article, we make an additional assumption that the imputations are produced by a multivariate normal imputation model (see e.g., Schafer, 1997). We note that this assumption is made when one uses the popular imputation software program SAS PROC MI.
Consider imputation m. From the complete data set Y m , we can easily compute estimates of the mean vectorμ m and the covariance matrix⌺ m as per Equations 8 and 9. Writing the estimate of ω from imputation m asω m ¼ ½μ 0 m 0 ; vechð⌺ m Þ 0 0 , the combined estimate of ω across the M imputations is the averagẽ The above equation is nothing more than a direct application of the standard multiple imputation combination rule to the unstructured/saturated first-and second-order moments of the complete data.
In essence, Equation 16 is a stochastic counterpart of the deterministic EM algorithm for handling missing data in the multivariate normal model (Dempster et al., 1977). EM produces maximum likelihood estimates of the unstructured/ saturated first and second order moments based on the observed data. Here, the estimation of ω eschews EM in favor of multiple random imputations. We believe that this is a more flexible method than the EM algorithm. For example, when multiple sets of plausible values are present, Equation 16 easily produces the combined estimates of means and covariances, whereas the EM algorithm would be entirely useless because the data sets containing the plausible values do not have missingness.
Under multivariate normality of the complete data model, we can obtain the following estimate of the complete data Fisher information matrix based on imputation m Lee and Cai where K n is the n 2 Â nðn þ 1Þ=2 duplication matrix as defined in Schott (1997).
The covariance matrix ofω m is equal to N À1 ½Fðω m Þ À1 . Again applying standard multiple imputation variance formula, we have an estimate of the covariance matrix ofω The form of N À1Ä has a natural interpretation. If there are no missing data, or equivalently stated, if there is no between-imputation variability due to missing information,Ä solely depends on the multivariate normal theory Fisher information matrix-the first component in square brackets. If there is missing information, the between-imputation component captures the added uncertainty due to missing data-the second set of square brackets. Note that Equations 17 and 18 are based on the multivariate normality assumption. Therefore,Ä is a model-based estimate of the covariance matrix ofω. This point will subsequently be important because having a model-based covariance matrix forω requires substantially smaller sample size to achieve stable estimation and inference for the structural parameters of interest. A basic tenet of the theory behind multiple imputation is that if the imputations are proper, and if M is large,ω provides a consistent estimate of ω (Rubin, 1996). Furthermore, when M is large, the limiting distribution of ffiffiffiffi N p ðω À ωÞ is normal, with zero means, and an asymptotic covariance matrix that can be consistently estimated byÄ. Thus, similar (in spirit) to Cai and Lee's (2009) two-stage estimator wherein the first stage is based on the EM algorithm, we can useω directly in discrepancy function based estimation of the structural parameters in θ.

Stage Two: Estimation and Inference
More formally, for the mean and covariance structure model ωθ, we define the MI2S estimator of θ asθ where Fðθ;ωÞ is the maximum likelihood discrepancy function as defined in Equation 10. Becauseω is a consistent estimate of ω as M tends to infinity, and the estimating equations defined by the maximum likelihood discrepancy function is unbiased, the resulting two-stage estimatorθ should be consistent and asymptotically normal, which can be shown using Yuan and Jennrich's (1998) asymptotic distribution theory for generalized estimating equations.

Alternative Multiple Imputation Inference
A comparison of Equations 11 and 19 clearly reveals the fact that the multiple imputation estimate of means and covariancesω serves essentially the same role as sample means and sample covariances (i.e.,ω) in standard complete data estimation. There is, however, one hidden problem. One would naturally hope that N À 1 times the minimum discrepancy function value is similarly distributed as a chi-square variable, but as Yuan and Bentler (2000) noted, in general this is not true. Due to the presence of multiple imputations, the asymptotic covariance matrix ofω is not of a standard form that is assumed in the maximum likelihood discrepancy function. Equation 18 makes this point amply clear. While the first part ofÄ (the within-imputation variance) comes from standard multivariate normal theory, the second part (the between-imputation variance) depends on the variability of the imputations, which is a function of the fraction of missing information. Thus, in general, the discrepancy function Fðθ;ωÞ is not correctly specified (in the sense of Browne, 1984). According to the theory developed in Yuan and Bentler (2000), when the discrepancy function is not correctly specified, though the estimator itself may still be consistent and asymptotically normal, the test statistic T M is at best distributed as a mixture of single degree of freedom chi-square variates.
A related problem is that the standard errors based on the second derivatives of Fðθ;ωÞ are also incorrect. They will in general be too small. An intuitive explanation is as follows. The standard errors are related to sample size. The larger N is, the smaller the standard errors are. However, when there is missing information, the sample size associated withω is no longer unequivocally equal to N . The means and covariances analyzed in the second stage of MI2S are based on less information than the number of cases N would suggest.
To remedy the situation, we apply Browne's (1984) Proposition 4 to the context of multiple imputation inference. Proposition 4 contains a residual-based test statistic that is asymptotically chi-square for any consistent and asymptotically normal estimator of θ. Let JðθÞ ¼ qωðθÞ qθ be the d Â q Jacobian of the mean and covariance structure model. Under Browne's (1984) regularity conditions, JðθÞ should be of full column rank, so there exists a d Â ðd À qÞ matrix J c ðθÞ that is an orthogonal complement of J(θ), such that ½J c ðθÞ 0 JðθÞ ¼ 0.
Under MI2S estimation, let ωðθÞ be the model-implied moments. The residual moments are simply e ¼ω À ωðθÞ. Furthermore, recall thatÄ as defined in Equation 18 is a d Â d symmetric matrix that consistently estimates the limiting covariance matrix of ffiffiffiffi N pω . Then, given H 0 : ω À ωðθÞ ¼ 0 for some θ 0 2 Θ versus H 1 : ω À ωðθÞ 6 ¼ 0 for any θ, the following goodness of fit (GOF) statistic ½J c ðθÞ 0 . We emphasize that though T BM can be viewed as an application of Browne's (1984) Proposition 4, it is not the same as the asymptotically distribution free (ADF) test statistic to which the article is more famously associated. In fact, T BM uses a weight matrix that is model-based, as opposed to the samplebased weight matrix in Browne's (1984) ADF statistic. As we will demonstrate using simulations, the statistic works well even when N is as small as 100. Following a result in Browne and Arminger (1995), the limiting covariance matrix of the MI2S estimator ffiffiffiffi N p ðθÞ can be consistently estimated as ½JðθÞ 0Ä À1 JðθÞ À1 , whose diagonal elements are the squared standard errors for the structural parameter estimates. Finally, note that because of the chisquaredness of T BM and the fact that e 0 Γe does not explicitly depend on N, we can compute the familiar RMSEA statistic (Browne & Cudeck, 1993), using e 0 Γe as a discrepancy function value. Other fit indices, for example, the TLI, can be computed similarly, given the availability of combined summary statistics inω.

Simulation Studies
The simulation studies involve a series of comparisons involving the FIML estimator, the standard multiple imputation inferential procedure, with the newly developed MI2S estimator in the context of missing observations. For MI2S, the multiple imputations are generated under the multivariate normal model with the MCMC method. Specifically, we implement a data augmented Gibbs sampler described by Schafer (1997). The starting values of the Gibbs sampler are obtained by running an EM algorithm that produces the maximum likelihood estimates of the unstructured means and covariances. The burn-in period of the Gibbs sampler is set to 1,000 iterations. Throughout this section, M ¼ 20 imputations are taken with a thinning interval of 200 iterations between each imputation. These numerical settings are quite conservative. For structural model estimation, we use the default sequential quadratic programming solver in GAUSS (Aptech Systems, Inc., 2003) to optimize the log likelihood or to minimize the discrepancy function. The code for data generation, EM estimation, MCMC imputation, FIML estimation, and discrepancy function-based model fitting are entirely programmed in GAUSS.

Type I Error Rates
We conduct this study to show that the proposed MI2S fit test statistic T BM is indeed chi-square distributed under the null hypothesis. The data generating process is a confirmatory factor analysis (CFA) model with nine manifest variables and three correlated factors. The generating factor loading matrix is Alternative Multiple Imputation Inference The unique factor variances are given by ⌽ ¼ diagð:7; :9; :5; :3; :4; :4; :6; :4; :5Þ. For this model, q ¼ 21 free parameters make up θ. They are the six nonzero free loadings, the six factor variances and covariances, and the nine unique variances. Single-group factor analysis models do not have an explicit mean structure, and the generating μ is taken as a null vector. The covariance structure model of Equation 6 also simplifies to ⌺ðθÞ ¼ ⌳⌿⌳ 0 þ ⌽. The degrees of freedom is equal to 24. Complete multivariate normal data having the generating factor analysis covariance structure are first simulated. To simulate missingness, the complete data sets are then subject to two kinds of missing data mechanisms: missing completely at random (MCAR) and missing at random (MAR). For the MCAR condition, for each row in the simulated data matrix, a fair dice is rolled to decide whether there should be any missing values. Next, for a case that is chosen to contain missing observations, the values for the last three indicators are set to missing. This leaves about 16% of all observations missing. This pattern may occur in practice when a subset of individuals do not complete the questionnaire by study design (planned missingness), for example, due to the high cost of measuring all variables for all individuals.
For the MAR condition, the probability of missingness of the last three manifest variables is set to be linearly related to the mean of the first six manifest variables, say, Z. Specifically, we divided the distribution of Z into quartiles and set the missingness probabilities for the four regions to (.50, .20, .075, .025). Under this MAR condition, cases with lower values of Z have higher probabilities of missingness on the last three variables, leaving about 20% of all cases contain missing observations. As a benchmark, a condition with no missing data (NOMIS) is also included.
The three missing data conditions (NOMIS, MCAR, and MAR) are crossed with three sample sizes: N ¼ 100; 300; 500, resulting in nine simulation conditions. In each condition, 1,000 replications are attempted. For each replication, both the MI2S estimator and the FIML estimator are used to fit the CFA model. Non-converged replications are discarded.
Because the fitted model is correctly specified, our hypothesis is that T BM should be distributed as a central chi-square variable with 24 degrees of freedom Lee and Cai regardless of the amount or kind of missing data. On the other hand, the naive fit statistic T M (Equation 20) should not be chi-square distributed under MAR or MCAR. Under MCAR and MAR, T FIML is also asymptotically chi-square distributed with the same degrees of freedom as T BM . Under NOMIS, both T FIML and T M reduce to T F (Equation 12), and all statistics are distributed as central chisquare variable with 24 degrees of freedom. Table 1 presents a summary of the results of this simulation study. First, under NOMIS, both T FIML and T BM are clearly behaving as they should. The means of the empirical distribution of these two test statistics are very close to 24, which is the theoretical expected value of a central chi-square variable with 24 degrees of freedom. The variances are also close to the theoretical value of 48. The empirical Type I error rates at the .01, .05, and .10 levels closely follow the nominal alpha level. While T FIML tends to be more liberal at a smaller N , T BM is more conservative. The performance of both T FIML and T BM appears to improve as sample size increases. Next, for the MCAR condition, the naive test statistic T M is obviously not chi-square distributed. Its means and variances are too large, and the empirical rejection rates are far above the nominal level. On the other hand, both T FIML and T BM maintain adequate control over Type I error rates, and their meanvariance relations much better approximate that of a central chi-square variable with 24 degrees of freedom. The type I error rates of D 2 and D 3 are very close to the nominal alpha level. We observe essentially the same phenomena under the MAR condition. Our conclusion is that as far as Type I error rates are concerned, the MI2S statistic T BM performs at least as well as T FIML , D 2 , and D 3 across the conditions examined. For relatively small sample sizes, T BM may even lead to a slightly better calibrated test than T FIML , D 2 , or D 3 when the data are MAR.

Power to Detect Model Misspecification
In this simulation, we deliberately mis-specify the fitted model so that we may investigate the power of the MI2S statistic. The generating model is essentially the same as the model used in the previous simulation study. The only difference is that the generating model contains an extra parameter-a cross-loading for the ninth indicator on the first factor l 91 equaling :7. The fitted model is still a three-factor CFA model, but it omits the crossloading as a free parameter. A relatively mild degree of misspecification is chosen so that the Monte Carlo simulation can show more nuances of the tests, especially at large N . The reference distribution of T FIML and F BM is still central chi-square with 24 degrees of freedom. However, because of the misspecification, the empirical distribution of the test statistics will not be central chi-square distributed.
The missing data conditions remain the same as in the previous simulation study (NOMIS, MCAR, and MAR). There are three sample size conditions: 100, 300, and 500. In each of the nine conditions, 1,000 replications are Alternative Multiple Imputation Inference

Lee and Cai
attempted. For each replication, we compare the rejection rates of T BM with T FIML at the .01, .05, and .10 nominal alpha levels. Table 2 presents a summary of the simulation results. The trend is quite clear. The test based on T BM is slightly less powerful than T FIML and D 2 and D 3 , when N is small. However, considering the mild degree of misspecification, T BM shows acceptable levels of power, and the difference in power diminishes as N increases to 300 and 500.

Bias and Variability
To take a closer look at the MI2S estimator, we examine the estimated relative bias (RB) and root mean square deviation (RMSD) of the estimates from true generating parameter values. We continue to use the same CFA data generating model as in the first simulation study. The fitted model is correctly specified, so in effect, we are examining parameter recovery. For a generic parameter θ, we define estimated relative bias as where θ is the true value, L is the number of Monte Carlo replications (1,000 in this case),θ l is the parameter estimate in replication l. We define RMSD as ðθ l À θÞ 2 v u u t : Table 3 presents the true parameter values, RB, and RMSD for key parameters (factor loadings and correlations) under MCAR when N is 100. Full results for other conditions are available upon request, but in summary, FIML, MI2S, and the standard multiple imputation method are all relatively unbiased in the conditions examined. The maximum absolute value of RB never exceeded 6%. Though MI2S is slightly more variable when N is small, presumably because of the additional random imputation variability, its RMSDs are different from those of the FIML and the standard multiple imputation estimators only in the third decimal place when N is large.

Applications to Real Data
In this section, we illustrate the performance of the MI2S estimator with applications to three real data sets. We wrote a SAS macro fully implementing the MI2S estimator and used it to obtain the results reported here. The macro and data sets are freely available from http://lcai.bol.ucla.edu/. Its usage is documented in Appendix A. The first two applications highlight the fact that the MI2S estimator can provide an asymptotically chi-square distributed fit test statistic,

Lee and Cai
missing information adjusted standard error estimates, as well as additional model fit indices such as RMSEA (with confidence interval) and TLI. The third application shows that the MI2S estimator can conveniently handle mean and covariance structure modeling for data sets containing multiple sets of plausible values. When we have to create multiple imputations using MCMC sampling, we generally use a burn-in period of 1,000 iterations and subsampling intervals of 200.

CFA
We first apply the MI2S procedure to the well-known Open-Book Closed-Book data set in Mardia, Kent, and Bibby (1979). The original data set contains test scores on five subject areas obtained from 88 examinees, that is, n ¼ 5 and N ¼ 88. The tests of the first two subjects are open-book, and the rest are closedbook exams.
The model specification is as follows. The factor loading matrix is The unique factor covariance matrix is given by ⌽ ¼ diagðf 11 ; . . . ; f 55 Þ. The degrees of freedom is equal to 4. For this complete data set, it is known that a Alternative Multiple Imputation Inference two-factor CFA model fits very well, yielding a chi-square of 2:07 under maximum likelihood estimation.
To illustrate the newly proposed MI2S estimator, we artificially created MAR data by applying the same procedure used in Cai and Lee (2009), wherein the scores on the last three variables were set to missing if the sum of the first two was less than 80, resulting in 28 cases with missing values. Next, based on 20 imputations ðM ¼ 20Þ, the combined estimate of means, variances, and covariances,ω, was obtained by Equation 16. The two-factor CFA model was then fitted using the MI2S estimator.
The empirical results mirror the simulation study. More specifically, with 4 degrees of freedom, the MI2S test statistic T BM is 6:67; p ¼ :15, whereas the naive statistic T M is 10:89; p ¼ :03. Having analyzed the complete data set, we know that the model should fit well. In this case, the difference between T BM and T M is large enough to lead to a possibly erroneous decision on the fit of the hypothesized CFA model. RMSEA and TLI for this model are, based on our MI2S estimator, equal to .086 and .808, respectively. A 90% confidence interval of RMSEA is [.000, .199].
For the purpose of comparison, we fitted the model using the FIML estimator and obtained T FIML ¼ 6:64; p ¼ :16, a result in close agreement with the MI2S estimator. We also computed D 2 and D 3 statistics and the associated p values for this model. The p value associated with D 2 is Fð4; 91:35Þ ¼ 1:75; p ¼ :15 and the p value associated with D 3 is Fð4; 302:82Þ ¼ 1:42; p ¼ :23.
We also calculated adjusted standard error estimates for the structural parameter estimates. The results are presented in Table 4. The entries in the ''Adjusted'' column show the standard error estimates produced by MI2S estimator and the entries in the ''Unadjusted'' column show the naive standard error estimates obtained by treating the combined estimate of the means and the covariance matrix as if it comes from complete data. Notice that the unadjusted standard error estimates are in general smaller than the MI2S standard error estimates. It is an artifact caused by neglecting the fraction of missing information in the combined estimates. In contrast, the MI2S standard errors are properly inflated, accounting for the missing information. For the purpose of comparisons, the standard error estimates from the standard MI estimator and FIML estimator are presented in the columns of ''Standard'' and ''FIML,'' respectively. It can be seen that these standard error estimates are not only properly inflated adjusting for the missing information but also very close to those from the MI2S estimator.

Conditional Latent Curve Analysis
In the second application, the data from a symposium at the 1997 meeting of the Society for Research on Child Development (Curran, 1997) were analyzed. The data set contains four repeated measures of N ¼ 405 participants from the National Longitudinal Survey of Labor Marketing Experience in Youth on their aggressive Lee and Cai behavior. A number of time-invariant covariates, including gender and mother's age, are also available. The data set exhibits substantial attrition of respondents. Roughly half of the cases have missing observations in the aggressive behavior variable on one or more measurement occasions beyond baseline. To handle the missingness, we created M ¼ 20 imputations and utilized the MI2S estimator.
The goal is to fit a linear latent curve model (Bollen & Curran, 2006) to the repeated measurements of aggressive behavior. We included gender and mother's age as time-invariant covariates, making n ¼ 6. We freely estimated the covariance matrix of the intercept and slope factors. The time-specific residual variances were allowed to be heteroskedastic. This model has 9 degrees of freedom.
The MI2S test statistic and the FIML test statistic come out to be very similar, that is, T BM ¼ 16:23; p ¼ :06 and T FIML ¼ 16:56; p ¼ :06, respectively. In contrast, the naive statistic T M is 21:07; p ¼ :01. Once again, the difference between T M and T BM is large enough to lead to qualitatively different conclusions about model fit. The p value associated with D 2 is Fð9; 199:21Þ ¼ 1:68; p ¼ :10 and the p value associated with D 3 is Fð9; 1926:62Þ ¼ 1:58; p ¼ :12. The RMSEA and TLI based on MI2S estimator are equal to .044 and .893, respectively, with a 90% confidence interval of RMSEA being [.000, .078]. Though not reported here due to space constraints, unadjusted standard error estimates for the structural parameter estimates are all erroneously smaller than the adjusted standard error estimates produced by the MI2S estimator. As the third application, the MI2S estimator is applied in the mean and covariance structure modeling of a data set containing multiple sets of plausible values: PISA. Conducted triennially by the Organization for Economic Co-operation and Development (OECD), PISA is a system of international assessments that measures 15-year-olds' literacy in reading, mathematics, and science. The PISA data also include numerous items on student characteristics, student family background, and student perceptions, just to name a few. More information about PISA can be found at http://www.pisa.oecd.org/.
A distinctive characteristic of the PISA data set is that student performance on literacy scales and subscales are reported as M ¼ 5 sets of plausible values for each of the scales or subscales. In other words, instead of reporting a single scale score, PISA uses multiple imputation to represent the unknown values of students' literacy. As such, the MI2S estimator can be effectively employed. Indeed, it can be even simpler than the case of missing data because we do not have to generate the imputations ourselves.
In this particular application, we use data from the assessment conducted in 2006, focusing on the United States school sample. PISA 2006 uses a two-stage sampling design in which schools are first sampled and then students are sampled in the participating schools. Therefore, it is required to use sampling weights or to model the clustered data structure explicitly (or maybe both) for sound statistical analyses. 1 However, for a simple illustration of the MI2S estimator under the presence of multiple sets of plausible values, we ignored the sampling weights.
First, purely for the sake of illustration, we developed a hypothetical structural equation model in which a student's science literacy (SCI) is regressed on his or her mathematics literacy (MATH) and the value of science (VoSCI) that he or she holds. Figure 1 presents a conceptual path diagram of our structural equation model. MATH is an observed variable consisting of five sets of plausible values. SCI is a latent variable, measured by three subscale variables: explaining phenomena scientifically (EPS), using scientific evidence (USE), and identifying scientific issues (ISI). EPS is a scaling indicator with a fixed factor loading of 1.0. Each of the science subscale variables contains five sets of plausible values. VoSCI is a latent variable, measured by two manifest variables: the general value of science (GSCI) and perceptions of the personal value of science (PSCI). GSCI is the scaling indicator for VoSCI. Neither GSCI nor PSCI contains plausible values. We freely estimated the covariance of a student's mathematics literacy (MATH) and his/her value of science (VosCI). To aid interpretation, we standardized all observed variables. The analysis sample includes cases with complete data for all variables ðN ¼ 5531Þ.
Next, we combined the M ¼ 5 data sets containing plausible values and produced a single estimate of means and covariance matrix using Equation 16. We then applied the MI2S estimator to assess the plausibleness of the postulated Lee and Cai model. The overall model fit statistic, T BM ; came out to be 114.87 on 7 degrees of freedom, p < :001. We can also compute RMSEA and TLI for this model, which are equal to .053 and .947, respectively, with a 90% confidence interval of RMSEA being [.045, .062]. Furthermore, we obtained the structural parameter estimates and the associated standard error estimates (see Table 2). We also computed D 2 and D 3 statistics and the associated p values. The p value associated with D 2 is Fð7; 104:95Þ ¼ 15:80; p < :001 and the p value associated with D 3 is Fð7; 553:98Þ ¼ 14:70; p < :001. Taken together, the results indicate that the MI2S estimator is equally applicable to the case of plausible values.

Discussions
In this article, we proposed a new two-stage estimator and inferential tools (MI2S) for mean and covariance structure modeling under the presence of multiple imputations. While standard multiple imputation theory dictates that one should fit a mean and covariance structure model M times (once to each  Alternative Multiple Imputation Inference imputation) and combine the results at the very end, the MI2S estimator permits us to combine the multiple imputations at the beginning of data analysis. It produces a single estimate of the means and covariance matrix by averaging the unstructured means and covariance matrices across multiple imputations. Adjustment for the between-imputation variation is also carried out at the beginning, yielding a proper asymptotic covariance matrix of the estimated means and covariances. Using Browne's (1984) results, this asymptotic covariance matrix can be used to obtain an asymptotically chi-square distributed model fit statistic, T BM , and the standard error estimates associated with the structural parameter estimates. Our simulation studies and empirical data analysis show that the MI2S estimator performs as well as the existing FIML estimator under MCAR and MAR conditions not only in terms of Type I error rates and statistical power of overall model fit statistics but also in terms of bias and variability of parameter estimates.
The MI2S estimator is potentially more flexible than FIML. For example, handling plausible values is natural in the MI2S estimator, whereas FIML estimator will not be of much help because there is technically no missingness in the data sets. In addition, inclusion of missing-data-relevant covariates is very straightforward because MI2S is within a multiple imputation framework. In contrast, though it is possible to include missing data relevant covariates for the FIML estimator (Collins, Schafer, & Kam, 2001;Graham, 2003), significantly more efforts are required if the missing data mechanism needs to be explicitly modeled. Moreover, as the number of missing-data-relevant covariates increases, model estimation and identification becomes increasingly problematic for FIML (Savalei & Bentler, 2009, p. 494).
Compared with the standard multiple imputation procedures, the MI2S estimator is computationally more efficient (i.e., a single model-fitting vs. multiple model-fittings) and produces a chi-square distributed test statistic T BM that can form the basis of mean and covariance structure model fit evaluations. Using T BM , it becomes straightforward to conduct tests of close fit (Browne & Cudeck, 1993) and obtain various fit indices such as RMSEA and TLI. On the other hand, using the standard multiple imputation procedure, it is not clear how to combine multiple sets of various fit indexes obtained from the multiple model-fittings because imputation generates a slightly different saturated model.
Our research is not without limitations. First, the derivations are exclusively based on multivariate normal theory. The multivariate normality assumption implies that in higher order moments, interactions or nonlinearities among variables are not modeled in the imputation process. Therefore, if such complex associations are to be a crucial part of the analysis, biased consequences may be obtained due to the inconsistencies between the model used in the imputations and the model used in subsequent analyses of the imputed data sets. This remains an inherent issue for multiple imputation based estimators, MI2S included.
When the variables are clearly not normal (e.g., categorical variables or design variables), a different imputation approach can be employed based on log-linear Lee and Cai models or general location models (Schafer, 1997) or sequential generalized regression models (Raghunathan, Lepkowski, Van Hoewyk, & Solenberger, 2001). More research should be conducted in the future to explore the possible existence of non-normal versions of two-stage estimators that work under multiple imputation.
Second, our simulation studies only examined a small set of conditions. In particular, a closer examination of Table 1 reveals that, as sample size increases, Type I error rates of T FIML tend to approach the nominal level from above, whereas the opposite trend holds for T BM . Such tendencies may explain why T BM can be a less powerful statistic than T FIML , particularly when N is small. Before reaching a general conclusion, however, further research is warranted. Future simulation studies should also examine the robustness of MI2S under a wider variety of missing data conditions and types of structural models.
Third, we only implemented the proposed methods in SAS because of its widespread use in both academic and nonacademic settings. Though our macro is freely available, it is nevertheless restricted to a single software environment. We will explore the possibility of implementing the method in free statistical software such as R.
Next, one must define the structural model to be fitted to the MI2S estimated means and covariances using one of the programming statements acceptable to PROC CALIS. The model definition must be a quoted SAS macro string and in this example we use the LINEQS-style statements: %let mymodel¼%str( lineqs Finally, the macro MI2S is invoked. %MI2S(indata¼outmi20, var¼antil1 anti2 anti3 anti4 gen momage, nvar¼6, nobs¼405, nimp¼20, impidx¼_imputation_, calismodel¼ &mymodel); The indata and var options tell the macro the multiply imputed data set name and the names of the variables. The nvar and nobs options give the dimensions of the analysis in terms of the number of manifest variables and the number of observations. The nimp and impidx options tell the macro the number of imputations performed and the name of imputation index variable. These options should be specified in accordance with the number of imputations and index variable name specified previously. In this particular example, nimp¼20 and impidx¼_Imputa-tion_. Consider another example. When a data set consists of five sets of plausible values indexed by _PV_, then nimp ¼ 5 and impidx ¼ _PV_.
Finally, the quoted macro string &mymodel is passed on to the macro to define the structural model. The macro produces the following output:

Lee and Cai
The value of T MB , the degrees of freedom, the p value, and RMSEA point estimate are printed. The macro also prints MI2S structural parameter estimates along with adjusted standard errors.