We consider a regression problem for predicting the demand of bike-sharing services in Washington D.C.1 The prediction task is to predict the demand for the bikes (column cnt) given the other features: ignore the columns instant and dteday. Use the day.csv file from the data folder. (a) Write a Python file to load day.csv.2 Compute the correlation coefficient of each feature with the response (i.e., cnt). Include a table with the correlation coefficient of each feature with the response. Which features are positively correlated (i.e., have positive corre- lation coefficient) with the response? Which feature has the highest positive correlation with the response? (b) Were you able to find any features with a negative correlation coefficient with the response? If not, can you think of a feature that is not provided in the dataset but may have a negative correlation coefficient with the response? (c) Now, divide the data into training and test sets with the training set having about 70 percent of the data. Import train_test_split from sklearn to perform this operation. Use an existing package to train a multiple linear regression model on the training set using all the features (except the ones excluded above). Report the coefficients of the linear regression models and the following metrics on the training data: (1) RMSE metric; (2) R2 metric. [Hint: You may find the libraries sklearn.linear_model.LinearRegression useful.] (d) Next, use the test set that was generated in the earlier step. Evaluate the trained model in step (c) on the testing set. Report the RMSE and R2 metrics on the testing set. (e) Interpret the results in your own words. Which features contribute mostly to the linear regression model? Is the model fitting the data well? How large is the model error?
1https://www.kaggle.com/datasets/marklvl/bike-sharing-dataset?search=bike+demand+Washington& select=Readme.txt. You can also find a Readme.txt file that explains all the features in the dataset. 2Refer to https://docs.python.org/3/library/csv.html on how to load a csv file in Python.