python


1)  (35 points) Consider a corpus that contain five documents in Table 1. Using python is fine for this question. In case you use Python for this question, submit your python code too.

Doc1

Decide which attribute the decision tree algorithm would choose.

Doc2

A decision tree is a classification algorithm that is widely used in machine learning. 

Doc3

Making a decision to put a tree is very difficult due to lack of power for the decision

Doc4

Language decision varies from person to person and time to time.

Doc5

Decision trees are different from binary trees or binary search trees.

a)  Build a term-document matrix based on raw count of each term for the corpus in Table 1 after removing stopwords and lemmatizing sentences. Use only noun and verb to build a term-document matrix.

b)  Build a term-document matrix based on tf-idf of each term for the corpus in Table 1 after removing stopwords and lemmatizing sentences. Use only noun and verb to build a term-document matrix.

Show the procedure how you calculated tf-idf.

(Use stopwords provided by NLTK given here: 

{‘of’, ‘against’, ‘ll’, ‘they’, ‘aren’, ‘our’, ‘that’, ‘shouldn’, ‘only’, ‘shan’, ‘o’, “isn’t”, ‘been’, “weren’t”, “you’ve”, ‘myself’, ‘as’, ‘once’, ‘my’, ‘both’, ‘too’, ‘be’, ‘should’, ‘hadn’, ‘in’, ‘does’, “you’ll”, ‘during’, ‘herself’, ‘will’, ‘any’, ‘was’, ‘how’, ‘which’, “didn’t”, ‘but’, ‘had’, ‘more’, ‘needn’, ‘further’, ‘whom’, ‘mustn’, ‘no’, ‘did’, “aren’t”, ‘or’, ‘on’, ‘down’, ‘them’, ‘to’, ‘same’, “shouldn’t”, “should’ve”, “mightn’t”, “it’s”, ‘between’, ‘before’, ‘he’, ‘here’, “hadn’t”, ‘have’, ‘if’, “you’re”, ‘haven’, ‘under’, ‘nor’, ‘t’, ‘can’, ‘re’, ‘it’, ‘y’, ‘where’, ‘then’, ‘she’, ‘own’, ‘hers’, ‘is’, ‘isn’, ‘each’, ‘don’, ‘now’, ‘by’, ‘than’, “hasn’t”, ‘his’, ‘who’, ‘above’, ‘this’, “mustn’t”, ‘their’, “couldn’t”, ‘there’, ‘couldn’, ‘over’, “you’d”, ‘m’, ‘doing’, ‘when’, ‘into’, ‘i’, ‘other’, ‘a’, ‘ours’, ‘because’, ‘we’, ‘an’, ‘weren’, ‘most’, ‘for’, ‘wasn’, “won’t”, ‘up’, “shan’t”, ‘while’, ‘your’, ‘am’, ‘through’, ‘after’, “don’t”, ‘theirs’, ‘ain’, ‘him’, ‘having’, ‘until’, ‘those’, ‘yourself’, ‘off’, ‘just’, ‘below’, ‘didn’, “wouldn’t”, “that’ll”, ‘out’, ‘mightn’, ‘ma’, ‘wouldn’, ‘such’, ‘won’, ‘all’, ‘the’, ‘has’, ‘ourselves’, ‘doesn’, ‘some’, ‘few’, ‘these’, ‘and’, “needn’t”, “doesn’t”, ‘what’, ‘with’, ‘very’, ‘himself’, ‘do’, ‘again’, ‘d’, ‘yours’, ‘are’, “wasn’t”, ‘not’, ‘being’, ‘were’, ‘from’, ‘me’, ‘ve’, ‘why’, ‘itself’, ‘s’, ‘so’, ‘hasn’, ‘her’, “she’s”, ‘you’, “haven’t”, ‘themselves’, ‘its’, ‘at’, ‘yourselves’, ‘about’}