Problem Context
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
Dataset Content
The datasets consists of several medical predictor variables and one target variable, Outcome. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.
Tasks:
1. Using KNIME platform Examine Summary Statistics
2. Build a Decision Tree Workflow in KNIME
3. Do the Classification Task on the dataset based on the Decision Tree you built in the previous step
4. Evaluate the Performance of your Decision Tree Model by Generate a Confusion Matrix and Determine Accuracy Rate
What to submit:
1. Summary Statistics of dataset (in a word document)
2. Explain how did you train the decision tree model for classification (in a word document)
3. Confusion Matrix results for your trained decision tree and its interpretation (in a word document)
4. KNIME Workflows of your Decision Tree model
Note: Excel data attached herewith.