Upload the Neth.CSV data file to google colab. Estimate a linear regression model of number of weekly trips per household as a function of the remaining variables (to the extent possible). In the final model identify which variables contribute to an increase in number of weekly trips and which variables contribute to a decrease in the number of weekly trips.
In estimating a linear regression model these are the steps I want you to do:
- Run multiple linear regression by including all variables
- Check for multi-collinearity and remove appropriate variables.
- Check if all variables are significant and remove the variables which are not significant one by one.
- Report the R2 and Adjusted R2 of the final model
- Check if the mean of the residuals is close to zero. Comment on the mean.
- Plot the histogram of the standardized residuals. Comment on whether the standardized residuals look normal.
- Check for outliers.
- Plot the residuals vs fitted values and comment.
The variable definitions are:
• HHSIZE household size
• NCAR number of cars in household
• HEMPSTS number of workers in household
• HSTUDEN number of students in household
• HTTRPS number of weekly trips per household
• NUCHLT12 number of children < 12 years in household
• CITY household residence in city (dummy variable)
• SUBURB household residence in suburb (dummy variable) • RURAL household residence in rural area (dummy variable)• INCOME continuous household income value
• NUCHGT12 number of children >= 12 yrs in household