Statistics

Chapter 10 Data Mining

Instructions: Please submit your work in one single Excel file with one tab/worksheet for each problem.

 

Cluster Analysis

  1. (25 points) Apply single linkage cluster analysis to Berkeley, Cal Tech, UCLA, and UNC in the Excel file Colleges and Universities Cluster Analysis Worksheet and draw a dendrogram illustrating the clustering process.

 

Classification

  1. In the Excel file Credit Risk Data, classify the following record:
    1. (25 points) Using k-NN algorithm for k=1 to 5.
    2. (25 points) Using discriminant analysis.

 

Association

  1. (25 points) The Excel file Automobile Options provides data on options ordered together for a particular model of automobile. Consider the following rules:
    1. Rule 1: If Fastest Engine, then Traction Control.
    2. Rule 2: If Faster Engine and 16-inch Wheels, then 3 Year Warranty.

Compute the support, confidence, and lift for each of these rules.