google collaborators probabilities


Upload the Neth.CSV data file to google colab. Estimate a linear regression model of number of weekly trips per household as a function of the remaining variables (to the extent possible). In the final model identify which variables contribute to an increase in number of weekly trips and which variables contribute to a decrease in the number of weekly trips.

In estimating a linear regression model these are the steps I want you to do:

  • Run multiple linear regression by including all variables
  • Check for multi-collinearity and remove appropriate variables.
  • Check if all variables are significant and remove the variables which are not significant one by one.
  • Report the R2 and Adjusted R2 of the final model
  • Check if the mean of the residuals is close to zero. Comment on the mean.
  • Plot the histogram of the standardized residuals. Comment on whether the standardized residuals look normal.
  • Check for outliers.
  • Plot the residuals vs fitted values and comment.
    The variable definitions are:
    • HHSIZE household size
    • NCAR number of cars in household
    • HEMPSTS number of workers in household
    • HSTUDEN number of students in household
    • HTTRPS number of weekly trips per household
    • NUCHLT12 number of children < 12 years in household
    • CITY household residence in city (dummy variable)
    • SUBURB household residence in suburb (dummy variable) • RURAL household residence in rural area (dummy variable)• INCOME continuous household income value
    • NUCHGT12 number of children >= 12 yrs in household