MGS 9950--Regression Analysis

Home

Week: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16

NOTE: Please check back regularly to stay informed of changes.

Course: MGS 9950 . . . Regression Analysis . . . Spring 2008 . . . CRN # 13526.

 

Class Meets: Tuesdays-Thursdays 1:00--2:15 PM, Sparks 330. The final exam will be given on Tuesday, April 29 at noon.

Instructor: Professor Edward Rigdon, RCB 1338, phone (404) 413-7674, fax (404) 413-7699, email erigdon@gsu.edu.
Class website: find the link through http://www.edrigdon.com/.

Office Hours: by appointment only, but I am usually around campus Monday through Friday.

 
Texts: Cohen, Cohen, West and Aiken (2003), Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Mahwah, NJ: LEA, Inc.
Other assigned readings
Prerequisites: MGS 9920 ("Probability and Statistical Theory I"), BA 6000, and CSP I-VI.

Warning: All statements in this syllabus are tentative and subject to change. The student is responsible for staying informed of all changes.

Objectives: This course is designed for doctoral-level students who intend to use regression and related methods to address research questions. Regressiion is an extremely useful technique, though it may not get the hgeadlines of some otehr techniques. But a firm understanding of regression will definitely help anyone to better grasp otehr methods. Sadly, regresssion has also been misused and misunderstood, which has lead some people to dismiss this technique.

The central aim of the course is to enable students to use regression in their research. In attempting to achieve that central goal, this course will:

(1) introduce students to basic regression concepts and tools,

(2) allow students to practice applying regression in situations similar to those they may encounter in their careers,

(3) confront students with common problems in regression analysis, and allow them to test potential solutions

(4) to address myths and misconceptions about regression, and to show students where regression stands within the context of alternative statistical techniques

(5) help students to develop a practical "regression sense" that will help them to anticipate problems and to choose superior approaches.

Grades: The student's course grade will be based on the following components:

Exam I 20%
Exam II 20%
Homework and Class Contribution 35%
Project 25%

This course will employ the plus/minus grading system. Final decimal grades will be converted into letter grades as follows: 93-100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-, 77-79 = C+, 73-76 = C, 70-72 - C-, 60-69 = D, 0-59 = F.

EXAMS--The midterm and final exams will each consist of a number of short-answer questions relating to key concepts from the course. Students may be asked to briefly define a term, apply a concept, interpret partial output from a regression analysis, structure an analysis, or explain a result. Given our class format, exams will be designed to be completed in no more than 1:15. The key concepts notes in the course outline serve well as a study guide for the exams. In addition, we will make time for review sessions before each exam.

HOMEWORK--Homework problems will involve conducting regression analyses using SPSS. The point of these assignments is to give students practice in applying regression tools and in using a statistical package. The instructor may require a 1-2 page report regarding the results of each exercise. We will aim to have one homework exercise per week, except for the introductory week and exam weeks. Students may ask their fellow students for help with a homework assignment, but should not copy other students' work. Conducting the analyses individually is the only way to learn how to use these tools.

PROJECT--Students may work on the term project individually or as members of teams of 2 or (at most) 3 people. Members of teams will complete peer review forms at midterm and at end of term to rate their team members' contribution to the project. The instructor will specify a default project (likely to involve building models from U.S. Census Bureau or World Bank data), but students are invited to develop projects that reflect their own interests.

In particular, students are invited to explore the data archives of ICPSR, the Inter-University Consortium for Political and Social Research (http://www.icpsr.umich.edu/). ICPSR has a variety of interesting datasets (http://www.icpsr.umich.edu/access/subject.html) available to you by virtue of your GSU IP address. For example, Marketing students may be interested in Claes Fornell's American Customer Satisfaction Index (ACSI) datasets . Real Estate students may be interested in data on decisions regarding the location of industrial plants. There are datasets with U.S. respondents as well as datasets with respondents from other countries.

Once you have settled on a proposed project, please give me a paragraph specifying the dataset, the dependent variable, some likely and available predictors, and a brief rationale explaining your interest in the project, for approval. In the course of the project, students will be expected to examine background information and prior studies from a variety of sources. Students may also compile additional statistics from reputable sources to supplement this basic file.

The key deliverable from this project is a paper that describes the student's analyses and makes the case for the student's final model. The paper will be judged on the basis of soundness of regression technique, thoroughness, theoretical rationale of the model (does the model make sense?), novelty of the findings (are the results in any way unexpected?), strength of relationships (does the model do a good job of predicting?) and professionalism. The paper itself should be about 10 (but no more than 15) pages in length, plus a reasonable number of appendices and figures that clarify or support the paper.

Academic Honesty: GSU policies on academic honesty will be enforced in this class. Students must not take the words of others as their own, whether those words come from printed sources, from the Internet, from personal communication, or any other source. Students must always acknowledge the origins of words and ideas through appropriate notation in the text and appropriate references. Students must not cheat.

Attendance: Naturally, students will be expected to attend and participate in all class sessions.

LESSON PLAN: MGS 9950, REGRESSION ANALYSIS

Readings are from the Cohen et al. (2003) text, unless otherwise noted.

Week 1 / Jan. 8-10
:

Class 1 will (1) introduce everyone to each other, (2) introduce everyone to the class' requirements (including the term project), objectives and style, (3) provide a brief general overview of regression, and (4) survey students to learn about their familiarity with statistical analysis and with SPSS.

Class 2 (R) will begin our study of bivariate regression, starting with a review of standardized scores, correlation coefficients, and regression coefficients.

Required Reading (for class 2): Sections 2.1-2.5.

Key concepts: linear transformation; x = X - Mx; covariance; Pearson product-moment correlation coefficient; point-biserial correlation; phi coefficient; z distribution; regression coefficient.

Week 2 / Jan. 15-17:

I will miss our Jan. 17 session (I'll be at an academic administrator seminar--wheee). We'll pick a date to make up the session.

Class 3 (T) will deal with confidence intervals, hypothesis testing and power in bivariate regression. The first homework will be assigned.

Required Reading: 2.6-2.10.

Key concepts: r(y, yhat) = r(yx); variance decomposition of y; variance of a linear composite; coefficient of alienation; standard error of estimate; standard error of B(yx); extrapolation vs interpolation; confidence interval Vs NHST; margin of error; t distribution Vs z distribution.; Fisher's z' transformation; significance of r Vs significance of B; regression toward the mean; assumptions of the fixed regression model.

Week 3 / Jan. 22-24:

Class 4-5 extends the basic regression concepts from 1 predictor to 2 predictors, and then to k predictors. Class 4 (M) addresses estimation, and Class 5 (W) addresses standard errors, confidence intervals and hypothesis testing. Multiple predictors introduce the issue of covariance among predictors and the possibility of indirect effects and spurious relationships.

Required Reading: (T) 3.1-3.4; (R) 3.5-3.7.

Key concepts: (T-R) Ch. 2 Fisher's z' transformation; significance of r Vs significance of B; correlation/regression artifacts (item-total correlations, regression toward the mean, range restrictions, unreliability, similarity of distributions); assumptions of the fixed regression model; Ch. 3 direct and indirect effects; spurious correlation; requirements for causal reasoning; partial Vs zero-order coefiicients; excluded predictor problem; suppression.

Week 4 / Jan. 29-31:

Class 6 (T) will conclude our introducing to regression with k predictors. Class 7 (R) will introduce graphical approaches to representing data--an invaluable tool in data analysis--and introduce the assumptions underlying the ordinary least squares (OLS) method of estimation.

Required Reading: (T) 3.7-3.9; (R) 4.1-4.3.

Key concepts: (T) Multiple R; multiple R square; adjusted R square; significance of R square, unstandardized beta, and standardized beta; F. (R) Power in multivariate regression; choosing sample size for a study; cautions about prediction; Rozeboom's cross-validation R square; graphing concepts.

Week 5/ Feb. 5-7:

Class 8 (T) will look at how to look for evidence of violations of the assumptions underlying OLS methods, and some purported remedies for such violations. In Class 9 (R), we will begin a broad look at the kinds of research questions that we can address using regression. We will also introduce the idea of hierarchical regression, where some predictors are of more research interest than others.

Required Reading: (T) 4.4-4.6; (R) 5.1-5.3.

Key concepts: (T) Violation of assumptions (consequences / detection / remedies); correctness of the model; distribution of residuals. (R) Prediction Vs explanation; elasticity; hierarchical regression; order of entry.

Week 6 / Feb 12-14:

Classes 10-11 will extend the idea of a hierarchy of predictors to a hierarchy of *sets* of predictors. There is a fair amount of complexity here, and we need to take our time.

Required Reading: (TR) 5.4-5.8.

Key concepts: (TR) Research factors; R square for sets of predictors in hierarchical regression; controlling Type I error and Type II error with complex investigations; Fisher's "protected" t test.

Week 7 / Feb. 19-21:

Class 12 (T) introduces the use of polynomial terms as predictors. Class 13 (R) focuses on transformations to meet assumptions and/or aid interpretability.

Required Reading: (T) 6.1-6.3; (R) 6.4-6.7.

Key concepts: (T) Linear in the variables Vs linear in the coefficients; linearizable by transformation Vs inherently nonlinear; powers of X as aspects of X; order of the polynomial; centering predictors; essential collinearity Vs nonessential collinearity; simple slope; orthogonal polynomials. (R) Logarithms; started logs and started powers; ladder of re-expression; bulging rule; one-bend transformations Vs two-bend transformations.

Week 8 / Feb. 26-28:

Class 14 (T) is a review session--bring your questions. Class 15 (R) is our midterm exam, which will focus on concepts and interpretation of results.

Required Reading: none.

Key concepts: See Weeks 1-7.

Mar. 4-6: SPRING BREAK / NO CLASSES

Week 9 / Mar. 11-13:

In Class 16 (T) we will briefly review the midterm, and start talking about interactions with continuous predictors. We'll continue that discussion in Class 17 (R).

Required Reading: (T) 7.1-7.3; (R) 7.4-7.12.

Key concepts: (T) Interaction; regression plane; simple regression line; centering and collinearity; effects of reliability and research design on power to detect interactions. (R) Significance of simple slopes; why this significance is equation-dependent; cautions on using standardized estimates; ordinal Vs disordinal interactions; finding the crossing point; higher order interactions; interactions with sets of predictors.

Week 10 / Mar. 18-20:

In Classes 18-19 (TR) we'll make a careful study of non-continuous (categorical or nominal) predictors. We need to make careful coding choices in order to get the most out of a regression using these predictors.

Required Reading: (T) 8.1-8.3; (R) 8.4-8.8.

Key concepts: (T) Dummy codes; reference group; interpreting dummy code B, R and R square; standardization and nominal predictors; power and nominal predictors; "dummy-like" codes; (R) Unweighted effects codes; weighted effects codes; base group; contrast codes; choosing a coding system.

Week 11 / Mar. 25-27:

In Classes 20-21 (TR) we'll look at interactions involving categorical predictors. As we've seen before, categorical variables simplify regression in many ways, and that is certainly true of interactions. Still, we may find ourselves a bit behind at this point, so let's no take this too fast.

Required Reading: (T) 9.1-9.2; (R) 9.3-9.4.

Key concepts: (TR) Interpretation of nominal interactions; balanced Vs unbalanced designs; full representation; sums of squares Type I, II, III; impact of different coding schemes.

Week 12 / Apr. 1-3:

This week we'll struggle with two common problems in regression. In Class 22 (T), we'll talk about outliers (or influential cases). In Class 23 (R), we'll talk about collinearity, one of the primary drawbacks to nonexperimental research.

Required Reading: (T) 10.1-10.4; (R) 10.5-10.7.

Key concepts: (T) Outliers; leverage; hat values; discrepancy; internally studentized residuals (SRESID); externally studentized residuals (SDRESID); influence; DFFITSi; DFBETASi; clumps; index plots; best practice for outliers. (R) Collinearity; tolerance; variance inflation factor (VIF); condition number; respecification; ridge regression; principal components regression.

Week 13 / Apr. 8-10:

For the next three classes, we will study regression techniques where the dependent variable has special properties. In Class 24 (T), we will look at logistic regression, where the dependent variable is a dichotomy. In Class 25 (R), we'll look at Poisson regression, a tool for modeling count data.

Required Reading: (T) 13.1-13.2; (R) 13.3-13.6.

Key concepts: (T) Linear probability model; probability Vs odds Vs log(odds) or probit; odds ratio; deviance; maximum likelihood; log likelihood. (R) Pseudo R square; Cox-Snell; Nagelkerke; score test; Wald test; base rate.

Week 14 / Apr. 15-17:

In Class 26 (T), we will continue our discussion of polytomous logistic regression and Poisson regression. After this class, we have some options, but my default is to devote Class 27 (R) to introducing path analysis, a tool for modeling systems of multiple equations.

Required Reading: (T) no new reading; (R) Chapter 11; (W) 12.1-12.3.

Key concepts: (T) R square adjcount / Goodman-Kruskal lambda; polytomous logistic regression; nested dichotomies regression; ordinal logistic regression / proportional odds model; Poisson regression. (R) TBA.

Week 15 / Apr. 22-24 (April 24 will actually be our last class meeting):

The remaining classes will devoted to final exam review (class 28) and project presentations (classes 29-30).

Required Reading: none.

Last class (schedule will be adjusted once we make up the Jan. 17 session):

Class 30--presentations, continued, plus last-minute exam questions

Required Reading: none.

Final exam is Tuesday, April 29, noon.