Intro to data mining

Discussion: 

Consider the mean of a cluster of objects from a binary transaction data set.

1. What are the minimum and maximum values of the components of the mean?

2. What is the interpretation of components of the cluster mean?

3. Which components most accurately characterize the objects in the cluster?

Please clearly LIST your response out to all THREE (3) questions and ensure to cite the specific article with the binary transaction of data set. I will be examing this for myself and other students should verify this as well. Provide the Author, YYYY  and specific page number, with any content brought into the discussion.

Assignment: 

Answer the following questions in a point by point fashion.  NOT an essay. Please ensure to use the Author, YYYY APA citations with any content brought into the assignment.  

  1. For sparse data, discuss why considering only the presence of non-zero values might give a more accurate view of the objects than considering the actual magnitudes of values. When would such an approach not be desirable?
  2. Describe the change in the time complexity of K-means as the number of clusters to be found increases.
  3. Discuss the advantages and disadvantages of treating clustering as an optimization problem. Among other factors, consider efficiency, non-determinism, and whether an optimization-based approach captures all types of clusterings that are of interest.
  4. What is the time and space complexity of fuzzy c-means? Of SOM? How do these complexities compare to those of K-means?
  5. Explain the difference between likelihood and probability.
  6. Give an example of a set of clusters in which merging based on the closeness of clusters leads to a more natural set of clusters than merging based on the strength of connection (interconnectedness) of clusters.