The midterm The idea is for you to explore using logistics regression, support vector machines, decision trees and random forest I want you to explore at least 4 ways of running each one. As examples:


The midterm

The idea is for you to explore using logistics regression, support vector machines, decision trees and random forest

I want you to explore at least 4 ways of running each one. As examples:

Using logistics regression there are quite a few parameters:

  • l2 penalties (or none). When using l2 penalties, what is the correct C coefficient?
  • type of algorithm/solver to use
  • type of way of handling multi-class (number of classes > 2)
  • preprocessing – do you need to center/scale the data before hand?
    • My guess is no – all the features are in the same scale, but it should be verified

And with Support Vector Machines (read: https://scikit-learn.org/stable/modules/svm.html), explore:

  • Different multi-class parameters
  • LinearSVC vs SVC
  • For SVC, different kernels
  • Different margin

With Classification trees (https://scikit-learn.org/stable/modules/tree.html#tree), sklearn has two types:

  • Decision Trees (https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier)
  • Extra Tree Classification (https://scikit-learn.org/stable/modules/generated/sklearn.tree.ExtraTreeClassifier.html#sklearn.tree.ExtraTreeClassifier). I never used this one.

And there are random forest (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier)

Read the documentation, select the 4+ ways you want to explore each of these 4 classifiers, AND WRITE UP NOTES IN MARKDOWN CELLS Your interpretation and conclusions are really important