Fall 2015 QMB 6358 Indicator Variable Assignment
An “answer collection” mechanism will appear on Canvas shortly
The data set GrocSalesMall.xlsx contains information on the annual sales of 25 stores in a grocery chain. In the file are:
Sales Last year’s sales per store in $1000 units
Customers Number of Customers last year (in 1000s)
Mall 1 = store located at a mall, 0 = otherwise
1. Create a scatterplot between Sales and Customers. Use different symbols for the Mall and nonMall Stores
2. Run a regression of Sales on number of customers
3. Run a regression of Sales on both variables.
a. Interpret the coefficient on the Mall variable
b. Is this coefficient significant?
c. Is the regression in part 3 an improvement on the one in part 2? Look at the change in R2 and adjusted R2.
d. Show what is really “going on” in the ...view middle of the document...
Is this coefficient significant?
g. Is the regression in part 4 an improvement on the one in part 3? Look at the change in R2 and adjusted R2.
h. Show what is really “going on” in the regression by deriving the equation for sales in mall stores and for sales in non-mall stores.
A: For stores in a mall, each 1000 extra customers increases sales by 5.493 more than stores not in a mall. This is a slope adjustment.
B: This is significant (t=2.223 and p-value = .037)
C: R-square increased by 0.053 and adjusted R-square by 0.046.
D: Non mall: Sales = 336.675 + 6.598 Customers
In Malls: Sales = (336.675 – 590.449) + (6.698 + 5.493) Customers
Sales = -253.774 + 12.191 Customers
5. Create a scatterplot between Sales and Customers. Add lines that show the different Mall categories and fit lines for what you got in Part 4.
6. Use version 1 of the midterm exam data sets for this. Run a regression to predict total debt by primary income, secondary income, monthly payment for mortgage or rent, utility payments and family size.
7. Now define indicator variables for the quadrants of the city. Use NW as the base category.
8. Run another regression where the three indicator variables for location are added to the model in #6.
9. Compare the model from part 6 to the one from part 8.
i. Interpret the coefficients on the location variables and test them for significance.
All else equal, a household in NE has .772 more total debt payments than one in the base category (NW). One in SE is .943 higher. One in SW is .080 lower than NW.
Note of these are significant amounts because all t-ratios are modest and p-values are .363 or higher.
j. How much does adjusted R-square change from the model in #6?
The regular r-square goes from 0.813 to 0.817, but adjusted r-square drops from 0.800 to 0.795. This would be an indication that location does not really matter, at least the way we have modeled it.
k. When you add these location variables, do any of the other variable coefficients change a great deal?
Not by a great deal. Because location does not really affect debt, this is not surprising.