Chapter 2: Descriptive Statistics
Prerequisite: Chapter 1
2.1 Review of Univariate Statistics
The central tendency of a more or less symmetric distribution of a set of interval, or higher, scaled scores, is often summarized by the arithmetic mean, which is defined as
We can use the mean to create a deviation score,
so named because it quantifies the deviation of the score from the mean.
Deviation is often measured by squaring, since it equates negative and positive deviations. The sum of squared deviations, usually just called the sum of squares, is given by
Another method of calculating the sum of squares was ...view middle of the document...
You might note that here we are beginning to see some of the advantages of matrix notation. For example, look at the second line of the above equation. The piece 1'X expresses the operation of adding each of the columns of the X matrix and putting them in a row vector. How many more symbols would it take to express this using scalar notation using the summation operator (?
The mean vector can then be used to create the deviation score matrix, as below.
We would say of the D matrix that it is column-centered, as we have used the column means to center each column around zero.
Now lets reconsider the matrix X'X. This matrix is known as the raw, or uncorrected, sum of squares and cross products matrix. Often the latter part of this name is abbreviated SSCP. We will use the symbol B for the raw SSCP matrix:
In addition, we have seen this matrix expressed row by row and column by column in Equations (1.26) and (1.27). The uncorrected SSCP matrix can be corrected for the mean of each variable in X. Of course, it is then called the corrected SSCP matrix at that point:
A = D(D (2.10)
Note that Equation (2.10) is analogous to the classic statement of the sum of squares in Equation (2.3) while the second version in Equation (2.11) resembles the hand calculator formula found in Equation (2.4). The correction for the mean in the formula for the corrected SSCP matrix A can be expressed in a variety of other ways:
Now, we come to one of the most important matrices in all of statistics, namely the variance-covariance matrix, often just called the variance matrix. It is created by multiplying the scalar 1/(n-1) times A, i. e.
This is the unbiased formula for S. From time to time we might have occasion to see the maximum likelihood formula which uses n instead of n - 1. The covariance matrix is a symmetric matrix, square, with as many rows (and columns) as there are variables. We can think of it as summarizing the...