Back to the JMP Regression
Page
Back to the MBA 510 Page
The Regression equation
Testing Individual Coefficients
Testing the Overall Equation
The
regression equation
The object of a regression problem is to estimate the
coefficients bi
in the regression equation:
The equation above is the true relationship between Y and the Xs. When we have estimates of the model, the estimated equation is denoted byY = b0 + b1•X1 + b2•X2 + b3•X3 + . . . + e
Y = b0 + b1•X1 + b2•X2 + b3•X3 + . . . + e
When we have estimates bi, we can plug values for the Xi into the equation and predict the value of Y that would be associated with those values of the Xi.
Testing
individual coefficients
Once we have this equation, we can test whether the X
variables belong in the equation--that is, whether they really contribute
to explaining the changes in Y. If an X does not affect Y, its coefficient
would be zero. For example, if b2
= 0 then the equation above would be
orY = b0 + b1•X1 + 0•X2 + b3•X3 + e
To test whether a bi is really equal to zero, we use the t test for the estimated coefficient. Large values of t, or small p values indicate that there is a relationship between that X and Y.
Testing
the Overall Relationship
The strength of the overall relationship is measured
by R² (called the coefficient of determination). It measures
the fraction of the total variation in the Y variable that can be explained
by the variation in the X variables. It has a value of 1.00 if all
of the data points in the graph lie exactly on a straigth line--indicating
that all of the variation in Y can be accounted for by the variation in
the X variables. It has a value of zero if there is no relationship
between any of the X variables and the Y variable. The two
graphs below show the results of two regressions and their R²s.
(SEE stands for Standard Error of Estimate and in this context it measures
the variation of the data points away from the regression line.)
Note that in the first regression, the data points lie close to the regression
line and R² is large. This reflects the fact that most of the
variation of Y (from 13 through 23) is due to the fact that X changes (from
4 through 10). The second graph shows more variation ("deviation")
away from the regression line. The changes in X predict the same
changes in Y (the predicted values of Y are 13, 16, 19, 22), but now there
is an extra source of variation in Y, so a smaller fraction of the total
variation in Y can be explained by X. Thus, the R² is smaller
in the second graph.
To test the overall regression equation, we use the F
statistic. Usually, if any of the t statistics show that a
coefficient is significantly different from zero, then the F will be large.
The null hypothesis is that none of the variables helps to explain the
value of Y.
last updated March 21, 2001, by James R. Frederick
Copyright 2001 James R. Frederick