Analysis of covariance
From Wikipedia, the free encyclopedia
Analysis of covariance (ANCOVA) is a general linear model with one continuous explanatory variable and one or more factors. ANCOVA is a merger of ANOVA and regression for continuous variables. ANCOVA tests whether certain factors have an effect after removing the variance for which quantitative predictors (covariates) account. The inclusion of covariates can increase statistical power because it accounts for some of the variability.
Contents |
Assumptions
As any statistical procedure, ANCOVA makes certain assumptions about the data entered into the model. Only if these assumptions are met, at least approximately, will ANCOVA yield valid results. Specifically, ANCOVA, just like ANOVA, assumes that the dependent variable is normally distributed and homoscedastic. Further, since ANCOVA is a regression-based method, the relationship of the dependent variable to the independent variable(s) must be linear in the parameters, as in regression analysis.
Power considerations
While the inclusion of a covariate into an ANOVA generally increases statistical power by accounting for some of the variance in the dependent variable and thus increasing the ratio of variance explained by the independent variables, adding a covariate into ANOVA also reduces the degrees of freedom (see below). Accordingly, adding a covariate which accounts for very little variance in the dependent variable might actually reduce power.
Equations
One-factor ANCOVA analysis
One factor analysis is appropriate when dealing with more than 3 populations; k populations. The single factor has k levels equal to the k populations. n samples from each population are chosen randomly from their respective population.
Calculating the sum of squared deviates for the independent variable X and the dependent variable Y
The sum of squared deviates (SS): Failed to parse (Missing texvc executable; please see math/README to configure.): SST_y , Failed to parse (Missing texvc executable; please see math/README to configure.): SSTr_y , and Failed to parse (Missing texvc executable; please see math/README to configure.): SSE_y
must be calculated using the following equations for the dependent variable, Y. The SS for the covariate must also be calculated, the two necessary values are Failed to parse (Missing texvc executable; please see math/README to configure.): SST_x and Failed to parse (Missing texvc executable; please see math/README to configure.): SSE_x
.
The total sum of squares determines the variability of all the samples. Failed to parse (Missing texvc executable; please see math/README to configure.): n_T
represents the total number of samples:
- Failed to parse (Missing texvc executable; please see math/README to configure.): SST_y=\sum_{i=1}^n\sum_{j=1}^kY_{ij}^2-\frac{\left(\sum_{i=1}^n\sum_{j=1}^kY_{ij}\right)^2}{n_T}
The sum of squares for treatments determines the variability between populations or factors. Failed to parse (Missing texvc executable; please see math/README to configure.): n_k
represents the number of factors
- Failed to parse (Missing texvc executable; please see math/README to configure.): SSTr_y=\sum_{i=1}^n\left(\frac{\sum_{j=1}^kY_{ij}^2}{n_k}\right)-\frac{\left(\sum_{i=1}^n\sum_{j=1}^kY_{ij}\right)^2}{n_T}
The sum of squares for error determines the variability within each population or factor. Failed to parse (Missing texvc executable; please see math/README to configure.): n_n
represents the number of samples with a given population:
- Failed to parse (Missing texvc executable; please see math/README to configure.): SSE_y=\sum_{i=1}^n\sum_{j=1}^kY_{ij}^2-\sum_{i=1}^n\left(\frac{\sum_{j=1}^kY_{ij}^2}{n_k}\right)
The total sum of squares is equal to the sum of squares for treatments and the sum of squares for error:
- Failed to parse (Missing texvc executable; please see math/README to configure.): SST_y=SSTr_y+SSE_y.\,jkk
Calculating the covariance of X and Y
The total sum of square covariates determines the covariance of X and Y within all the data samples:
- Failed to parse (Missing texvc executable; please see math/README to configure.): SCT=\sum_{i=1}^n\sum_{j=1}^kX_{ij}Y_{ij}-\frac{\left(\sum_{i=1}^n\sum_{j=1}^kX_{ij}\right)\left(\sum_{i=1}^n\sum_{j=1}^kY_{ij}\right)}{n_T}
- Failed to parse (Missing texvc executable; please see math/README to configure.): SCE=\sum_{j=1}^k\left(\sum_{i=1}^nX_{ij}Y_{ij}-\frac{\sum_{i=1}^n(X_{ij}Y_{ij})}{n_n}\right)
Adjusting SSTy
The correlation between X and Y is Failed to parse (Missing texvc executable; please see math/README to configure.): r_T^2 .
- Failed to parse (Missing texvc executable; please see math/README to configure.): r_T^2=\frac{SCT^2}{SST_xSST_y}
- Failed to parse (Missing texvc executable; please see math/README to configure.): r_n^2=\frac{SCE^2}{SSE_xSSE_y}
The proportion of covariance is subtracted from the dependent, Failed to parse (Missing texvc executable; please see math/README to configure.): SS_y
values:
- Failed to parse (Missing texvc executable; please see math/README to configure.): SST_{yadj}=SST_y-r_T^2\,
- Failed to parse (Missing texvc executable; please see math/README to configure.): SSE_{yadj}=SSE_y-r_n^2\,
- Failed to parse (Missing texvc executable; please see math/README to configure.): SSTr_{yadj}=SST_{yadj}-SSE_{yadj}
Adjusting the means of each population k
The mean of each population is adjusted in the following manner:
- Failed to parse (Missing texvc executable; please see math/README to configure.): M_{y_iadj}=M_{y_i}-\frac{SCE_y}{SCE_x}(M_{x_i}-M_{x_T})
Analysis using adjusted sum of squares values
Mean squares for treatments where Failed to parse (Missing texvc executable; please see math/README to configure.): df_{Tr}
is equal to Failed to parse (Missing texvc executable; please see math/README to configure.): N_T-k-1
. Failed to parse (Missing texvc executable; please see math/README to configure.): df_{Tr}
is one less than in ANOVA to account for the covariance and Failed to parse (Missing texvc executable; please see math/README to configure.): df_E=k-1
- Failed to parse (Missing texvc executable; please see math/README to configure.): MSTr=\frac{SSTr}{df_{Tr}}
- Failed to parse (Missing texvc executable; please see math/README to configure.): MSE=\frac{SSE}{df_E}
The F statistic is
- Failed to parse (Missing texvc executable; please see math/README to configure.): F_{df_E,df_\mathrm{Tr}}=\frac{\mathrm{MSTr}}{\mathrm{MSE}}.

