The midterm
The idea is for you to explore using logistics regression, support vector machines, decision trees and random forest
I want you to explore at least 4 ways of running each one. As examples:
Using logistics regression there are quite a few parameters:
- l2 penalties (or none). When using l2 penalties, what is the correct C coefficient?
- type of algorithm/solver to use
- type of way of handling multi-class (number of classes > 2)
- preprocessing – do you need to center/scale the data before hand?
- My guess is no – all the features are in the same scale, but it should be verified
And with Support Vector Machines (read: https://scikit-learn.org/stable/modules/svm.html), explore:
- Different multi-class parameters
- LinearSVC vs SVC
- For SVC, different kernels
- Different margin
With Classification trees (https://scikit-learn.org/stable/modules/tree.html#tree), sklearn has two types:
- Decision Trees (https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier)
- Extra Tree Classification (https://scikit-learn.org/stable/modules/generated/sklearn.tree.ExtraTreeClassifier.html#sklearn.tree.ExtraTreeClassifier). I never used this one.
And there are random forest (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier)
Read the documentation, select the 4+ ways you want to explore each of these 4 classifiers, AND WRITE UP NOTES IN MARKDOWN CELLS Your interpretation and conclusions are really important