Discussion Post


Objectives:

In this group project, students will apply concepts from lecture and lab to complete the following instructions using the Car Auction dataset and present the final work as a concise business report. They will calculate descriptive statistics for numeric variables within the dataset, manipulate data in order to bin the numeric ranges for the purposes of visualizing histograms, create scatterplots and calculate the correlation between numeric variables, and summarize and interpret all output because calculation is only one component of analysis.

 

Goals:

Completing this project will allow students to demonstrate the ability to do the following:

  • Calculate descriptive statistics.
  • Create Histograms.
  • Plot the data using scatter plots and calculate the correlation coefficient.
  • Interpret all output.

 

Project Instructions:

 

  • Input File for Excel:
    • Car Auction – Assignment 1 Output.xlsx

 

  • Variables of interest. For each of the following steps, use each of these variables:
    1. VehAge
    2. VehBCost
    3. VehOdo
    4. Price Difference
    5. Miles per year
  • Calculate Descriptive statistics. Use the Data Analysis toolpak (Data -> Data Analysis -> Descriptive Statistics). Create a new sheet for each variable and rename each sheet.
  • Calculate the bins first, then use the Data Analysis toolpak (Data -> Data Analysis -> Histogram). Create a new sheet for each variable and rename each sheet.
  • Scatter plots and correlation – one for each combination of the numeric variables (there are 10).

 

Optional: Create a categorical variable for each numeric variable. Either use its Q1, median, and Q3 values to make each bucket equivalent, or use a different technique if you identify a different pattern.

Optional: Create a contingency table for each newly created categorical variable (from part (e) above) to compare lemon rates (use the IsBadBuy column in Column box, just like we did in Assignment 2).

Optional: Test the differences in the 5 numeric variables between Chevrolet and Ford. Copy the values for each group onto a new sheet. Get the means and standard deviations of variables for both groups, then use the T.TEST formula to calculate a p-value. Interpret the results.

 

Any optional work attempted will count as extra points towards your total group project grade for up to 3 total points (1 for each optional portion attempted). If all optional work is attempted and all the attempts are correct, an additional point will be added (for up to a total of 4 points towards your total group project grade).

 

  • Write a Word doc with a list of observations for all of the above work (aka a concise business report). Copy all output (i.e. descriptive statistics, histograms, scatter plots, and correlation coefficients) and be sure to interpret output for each variable (i.e. interpret a few items from the descriptive statistics individually for each variable, interpret histograms individually for each variable, and interpret each scatterplot and correlation coefficient.)

 

 

Documents for Submission (only one student needs to submit on behalf of the group, groups are randomly assigned by the instructor):

  • Excel file showing all the work performed
  • Word document (aka your formal business report) with the descriptive statistics and histogram for each variable along with all the scatterplots and correlation coefficients for the combination of the variables (total of 10 scatterplots and coefficients) in addition to interpretation for each output.

 

Grading Rubric:

  • Create 5 new worksheets with descriptive statistics for all 5 numeric variables specified:VehAge, VehBCost, VehOdo, Price Difference, and Miles per year. Provided an interpretation of the output. [33 points]
  • Create 5 new worksheets with histograms for all 5 numeric variables specified: VehAge, VehBCost, VehOdo,Price Difference, and Miles per year. Provided an interpretation of the output. [34 points]
  • Create 10 scatter plots, 1 for each combination of the 5 numeric variables. Calculate the correlation coefficient for each pair of variables. Provided an interpretation of the output. [33 points]