Skip to main content

REGRESSION


Regression is a statistical tool that allows you to predict the value of one continuous variable from one or more other variables. When you perform a regression analysis, you create a regression equation that predicts the values of your DV using the values of your IVs. Each IV is associated with specific coefficients in the equation that summarizes the relationship between that IV and the DV. Once we estimate a set of coefficients in a regression equation, we can use hypothesis tests and confidence intervals to make inferences about the corresponding parameters in the population. You can also use the regression equation to predict the value of the DV given a specified set of values for your IVs.

Simple Linear Regression
Simple linear regression is used to predict the value of a single continuous DV (which we will call Y) from a single continuous IV (which we will call X). Regression assumes that the relationship between IV and the DV can be represented by the equation.

Yi = Ī²0 + Ī² 1Xi + Ć„i,

where Yi is the value of the DV for case i, Xi is the value of the IV for case i, Ī²0 and Ī²1 are constants, and Ć„i is the error in prediction for case i. When you perform a regression, what you are basically doing is determining estimates of Ī²0 and Ī²1 that let you best predict values of Y from values of X. You may remember from geometry that the above equation is equivalent to a straight line. This is no accident, since the purpose of simple linear regression is to define the line that represents the relationship between our two variables. Ī²0 is the intercept of the line, indicating the expected value of Y when X = 0. Ī²1 is the slope of the line, indicating how much we expect Y will change when we increase X by a single unit.

The regression equation above is written in terms of population parameters. That indicates that our goal is to determine the relationship between the two variables in the population as a whole. We typically do this by taking a sample and then performing calculations to obtain the estimated regression equation

Yi = b0 + b1Xi .

Once you estimate the values of b0 and b1, you can substitute in those values and use the regression equation to predict the expected values of the DV for specific values of the IV. Predicting the values of Y from the values of X is referred to as regressing Y on X. When analyzing data from a study you will typically want to regress the values of the DV on the values
of the IV. This makes sense since you want to use the IV to explain variability in the DV. We typically calculate b0 and b1 using least squares estimation. This chooses estimates that minimize the sum of squared errors between the values of the estimated regression line and the actual observed values.

In addition to using the estimated regression equation for prediction, you can also perform hypothesis tests regarding the individual regression parameters. The slope of the regression equation (Ī²1) represents the change in Y with a one-unit change in X. If X predicts Y, then as X increases, Y should change in some systematic way. You can therefore test for a linear relationship between X and Y by determining whether the slope parameter is significantly different from zero.

When using performing linear regression, we typically make the following assumptions about the error terms Ć„i.

1. The errors have a normal distribution.
2. The same amount of error in the model is found at each level of X.
3. The errors in the model are all independent.

To perform a simple linear regression in SPSS
Choose Analyze thengoto Regression thengoto Linear.
Move the DV to the Dependent box.
Move the IV to the Independent(s) box.
Click the Continue button.
Click the OK button.

The output from this analysis will contain the following sections.
Variables Entered/Removed. This section is only used in model building and contains no useful information in simple linear regression.
Model Summary. The value listed below R is the correlation between your variables. The value listed below R Square is the proportion of variance in your DV that can be accounted for by your IV. The value in the Adjusted R Square column is a measure of model fit, adjusting for the number of IVs in the model. The value listed below Std. Error of the Estimate is the standard deviation of the residuals.
ANOVA. Here you will see an ANOVA table, which provides an F test of the relationship between your IV and your DV. If the F test is significant, it indicates that there is a relationship.
Coefficients. This section contains a table where each row corresponds to a single coefficient in your model. The row labeled Constant refers to the intercept, while the row containing the name of your IV refers to the slope. Inside the table, the column labeled B contains the estimates of the parameters and the column labeled Std. Error contains the standard error of those parameters. The column labeled Beta contains the standardized regression coefficient, which is the parameter estimate that you would get if you standardized both the IV and the DV by subtracting off their mean and dividing by their standard deviations. Standardized regression coefficients are sometimes used in multiple regression (discussed below) to compare the relative importance of different IVs when predicting the DV. In simple linear regression, the standardized regression coefficient will always be equal to the correlation between the IV and the DV. The column labeled t contains the value of the t-statistic testing whether the value of each parameter is equal to zero. The p-value of this test is found in the column labeled Sig. If the value for the IV is significant, then there is a relationship between the IV and the DV. Note that the square of the t statistic is equal to the F statistic in the ANOVA table and that the p-values of the two tests are equal. This is because both of these are testing whether there is a significant linear relationship between your variables.

Popular posts from this blog

Structure of a Research Article

UNIT ROOT TEST

Stationarity and Unit Root Testing l   The stationarity or otherwise of a series can strongly influence its behaviour and properties - e.g. persistence of shocks will be infinite for nonstationary series l   Spurious regressions. If two variables are trending over time, a regression of one on the other could have a high R 2 even if the two are totally unrelated l   If the variables in the regression model are not stationary, then it can be proved that the standard assumptions for asymptotic analysis will not be valid. In other words, the usual “ t -ratios” will not follow a t -distribution, so we cannot validly undertake hypothesis tests about the regression parameters. Stationary and Non-stationary Time Series Stationary Time Series l   A series is said to be stationary if the mean and autocovariances of the series do not depend on time. (A) Strictly Stationary : n   For a strictly stationary time series the distribution of   y(t) is independent of t .   Thus it is not just

ISI Journals - Economics

1. ACTUAL PROBLEMS OF ECONOMICS 2. AGRICULTURAL ECONOMICS 3. AGRICULTURAL ECONOMICS-ZEMEDELSKA EKONOMIKA 4. AMERICAN ECONOMIC JOURNAL-APPLIED ECONOMICS 5. AMERICAN JOURNAL OF AGRICULTURAL ECONOMICS 6. AMERICAN JOURNAL OF ECONOMICS AND SOCIOLOGY 7. AMERICAN LAW AND ECONOMICS REVIEW 8. ANNALS OF ECONOMICS AND FINANCE 9. ANNUAL REVIEW OF ECONOMICS 10. ANNUAL REVIEW OF FINANCIAL ECONOMICS 11. ANNUAL REVIEW OF RESOURCE ECONOMICS 12. ANNUAL REVIEW OF RESOURCE ECONOMICS 13. APPLIED ECONOMICS 14. APPLIED ECONOMICS LETTERS 15. AQUACULTURE ECONOMICS & MANAGEMENT 16. ASIA-PACIFIC JOURNAL OF ACCOUNTING & ECONOMICS 17. AUSTRALIAN JOURNAL OF AGRICULTURAL AND RESOURCE ECONOMICS 18. B E JOURNAL OF THEORETICAL ECONOMICS 19. BALTIC JOURNAL OF ECONOMICS 20. CAMBRIDGE JOURNAL OF ECONOMICS 21. CANADIAN JOURNAL OF AGRICULTURAL ECONOMICS-REVUE CANADIENNE D AGROECONOMIE 22. CANADIAN JOURNAL OF ECONOMICS-REVUE CANADIENNE D ECONOMIQUE 23. COMPUTATIONAL ECONOMICS 24. DEFENCE AND PEACE ECONOMICS 25. EA