Project one


In this project, you will demonstrate your mastery of the following competencies:

  • Implement statistical analysis using quantitative and qualitative variables
  • Apply statistical techniques to address research problems

Scenario

You are a data analyst working for a real estate company based in Seattle. You have access to a large set of historical data that you can use to analyze patterns between different attributes of a house (such as square footage and number of bathrooms) and the house’s selling price. You have been asked to create different regression models that can be used to predict a house’s selling price based on different factors. These regression models will help your company set better prices when listing a home for a client. You will use the R programming language to perform the statistical analyses and then prepare a report of your findings. Since your report will be read by different stakeholders within your real estate company, you will need to interpret your findings and describe their practical implications.

Note: This data set has been “cleaned” for the purposes of this assignment.

Reference

Harlfoxem. (2016). House Sales in King County, USA [Data file]. Retrieved from https://www.kaggle.com/harlfoxem/housesalesprediction

Directions

  1. R Script: To complete the tasks listed below, open the Project One Jupyter Notebook link in the Assignment Information module. Your project contains the data set and a Jupyter Notebook. The Jupyter Notebook contains instructions and blank code blocks where you will write your R scripts. You will be asked to complete the following regression analyses:

    • First Order Regression Model with Quantitative and Qualitative Variables
    • Complete Second Order Multiple Regression Model with Quantitative Variables
    • Nested Models F-Test
  2. Summary Report: Once you have completed all the steps in your R script, you will create a summary report to present your findings. Use the provided template to create your report. You must complete each of the following sections by answering all of the questions in each section.

    • Introduction: Set the context for your scenario and the analyses you will be performing.
    • First Order Regression Model with Quantitative and Qualitative Variables:

      1. Correlation analysis between the variables using data visualizations, correlation coefficients, and the correlation matrix
      2. Reporting results of the model by listing and interpreting various model statistics, including R2 and Ra2
      3. Evaluate the significance of the model by reporting parameter estimates and performing hypothesis testing for each estimate and the overall model.
      4. Use model equations to make predictions.
    • Complete Second Order Model with Quantitative Variables:

      1. Correlation analysis between the variables using data visualizations, correlation coefficients, and the correlation matrix
      2. Reporting results of the model by listing and interpreting estimates of various model statistics, including R2 and Ra2
      3. Evaluate the significance of the model by reporting parameter estimates and performing hypothesis testing for each estimate and the overall model.
      4. Use model equations to make predictions.
    • Nested Models F-Test:

      1. Reporting results of the model by listing and interpreting estimates of various model statistics
      2. Evaluate the significance of the model by reporting parameter estimates and performing hypothesis testing for each estimate and the overall model.
      3. Model Comparison: Evaluate whether the complete model is necessary by performing the nested models F-test.
    • Conclusion: Summarize your findings and explain their practical implications.

What to Submit

To complete this project, you must submit the following:

R Script
Your Jupyter Notebook R script contains all the statistical analyses you completed for this project. Download your work as an HTML file. Review the file to make sure that every step and all your outputs are included. Submit the HTML file as part of your submission. Review the Jupyter Notebook in Codio Tutorial in the Supporting Materials section if you need help.

Summary Report
Use the provided template to create your summary report. The template contains guiding questions to help you complete each section. Be sure to remove these questions before submitting your report. Your summary report should be submitted as a 3– to 5–page Microsoft Word document. It should include an APA-style cover page and APA citations for any sources used. Use single spacing, 11-point Calibri font, and one-inch margins.

Supporting Materials

The following resource(s) may help support your work on the project:

Document: Jupyter Notebook in Codio Tutorial
This tutorial will help you become familiar with the Jupyter Notebooks interface. You will learn how to open, complete, save, and download your Jupyter Notebook for this project.

Shapiro Library: APA Style Guide
This guide will help you format your cover page and references according to APA style. You are not required to use external resources for this project. However, if you do use any resources, you must cite them in APA format.