assignment

In this assignment, we want to study the iris dataset and if we can to predict the features of iris with a simple empirical model.

1. Open the iris_imported spreadsheet file you created from , Part 1.

More information on this data set and the variables in the data sets are available from the

2. Copy only the data in the subset of iris-virginica class, then paste this subset of data  into a new sheet in the iris_imported file. Rename the new sheet as viginica (1 point).

3. In the viginica sheet, randomly choose 4 rows of observations as test data. Cut and past these 4 rows of data to another new sheet which you rename as Test (1 point).

4. In a Word file which is saved as EM_iris, include your answer to the following questions (4 points):

(a) What type of data is it in the iris-virginica? and (b) what are the possible ways in which you can use the data to build a simple empirical model? (hint: any relationships you are hoping to find?)

5. in the viginica sheet, create two different regression models using two different pairs of feature/column. In each regression model, you need to create the scatter plot, Trendline/regression line, include equation of the Trendline and the R squared value. (5 points per model)

6. For each model, you should try at least two different Trendline options. Document the options you tried in the EM_iris Word file, and explain why you think your choice is the best option. (2 points per explanation on a model’s Trendline option)

7. In the Test sheet, use the regression line equations from the two models you create in the viginica sheet to predict the values for the 4 rows of test data respectively. Make sure you include the predicted values in columns appropriately named to identify which is the actual measurement and which is the predicted value. (3 points for each model’s prediction).

8. Evaluate the results from the two models, then in the EM_iris Word file, state (a) which of the two models is better, and (b) What relationship you find about the features of the iris dataset. (4 points)

9 both iris_imported Excel file and EM_iris Word file.