Module 3: Assignment
Predictive Analytics with Regression Models
Overview:
In this assignment, you will apply what you have learned about predictive analytics with regression
models to two case studies. First you will estimate three different models to predict monthly vehicles
served for auto repair shops using multiple linear regression techniques and choose the best model by
comparing the goodness of fit measures of three models. You will also use linear probability and
logistic models to test which customers are attracted to a specific brand and compare the accuracy of
models using the hold-out
method.
Prompt:
For this assignment, you will analyze the two case studies below and address the questions associated
with each:
Case 1:
For this case, first download the data: Quick Fix data (available in Blackboard).
Next review the following case study: The general manager of Quick Fix, a chain of quick-service,
no-appointment auto repair shops, wants to develop a model to forecast monthly vehicles served at
any particular shop based on four factors: garage bays, population within 5-miles radius (population
in 1,000s), interstate highway access (Access equals 1 if convenient, 0 otherwise), and time of the
year (Winter equals 1 if winter, 0 otherwise). He believes that all else equal, shops near an interstate
will service more vehicles and that more vehicles will be serviced in the winter due to battery and
tire issues. A sample of 19 locations has been obtained.
Then complete the actions below and record your answers in a Microsoft Word document.
Note: For step-by-step instructions on how to use Excel and Data Analysis Tool Pak to estimate and
predict with a multiple linear regression model in order to select the best model for a specific
situation, refer to the
1. Estimate the following three models and report your results in a user-friendly table.
a. Vehicles = β 0 + β 1Garage + β 2Population + ε
b. Vehicles = β 0 + β 1Garage + β 2Population + β 3Access + ε
c. Vehicles = β 0 + β 1Garage + β 2Population + β 3Access + β 4 Winter + ε
should include parameter estimates and p-values of each estimate (each model), standard error of
estimate (Se), R-squared, adjusted R-Squared, and p-value of the F-test. Table should include a
footnote for significance level(s) and additional information that a reader needs to understand the
table.
2. Use goodness of fit measures that you learned in Chapter 6 to select the best fitting model.
3. Interpret each slope coefficient of the model you selected in #3 above.
4. At the 5% significance level, are the predictor variables jointly significant? Are they
individually
significant? What about the 10% significance level?
5. Predict vehicles served in a non-winter month for a particular location with five garage bays,
a
population of 40,000, and convenient interstate access.
Case 2:
For this case, first download the data: Purchase_UnderArmour (available in Blackboard).
Next, review the following case study:
Annabel, a retail analyst, has been following Under Armor Inc., the pioneer in the
compression-gear market. Compression garments are meant to keep moisture away
from a wearer’s body during athletic activities in warm and cool weather. Annabel
believes that the Under Armor brand attracts a younger customer, whereas more
established companies, Nike and Adidas, draw an older clientele. In order to test her
belief, she collects data on the age of customers and whether or not they purchased
Under Armor (Purchase 1; for purchase, 0 otherwise).
Then complete the actions below and record your answers in a Microsoft Word document.
Note: For step-by-step instructions on how to estimate a linear probability model and logistic model
and how
to use the holdout method for both models, refer to the following videos from Lesson 2: Advanced
1. Estimate the linear probability model using Purchase as the response variable and Age as the
predictor variable. Compute the predicted probability of an Under Armour purchase for a 20-
year-old
customer and a 30-year-old customer.
2. Estimate the logistic regression model where the Purchase depends on age. Compute the
predicted
probability of an Under Armour purchase for a 20-year-old customer and a 30-year-old
customer.
3. Use a holdout method to compare the accuracy of the linear probability model (Model 1) and
the logistic regression model (Model 2) using the first 20 observations for training and the
remaining 10 observations for validation.
Submission Guidelines:
Your completed assignment must be submitted as a Microsoft Word document, 1-2 pages in length,
double spacing, 12-point Times New Roman font, and 1-inch margins. The submission must be
accompanied by a
Microsoft Excel spreadsheet showing your work. Only the Word document will be assessed for
grading purposes, however the spreadsheet is required and must be submitted to show your work.
Relevant graphs and/or tables of the data should be inserted within the Word document.