Summary
- ML problems can be posed as optimization problems where the objective function represents an error or performance measure on a dataset of training examples.
- The objective of classification is to minimize the misclassification rate. The goal of regression is to minimize the MSE.
- The fundamental problem of ML is generalization to unseen examples.
- ML relies on models to represent assumptions about the regularities of a dataset.
- Generalization is achieved by controlling model complexity to avoid under-fitting or over-fitting.
- K-means is a clustering algorithm which alternates between an update of clusters given centroids and an update of the centroids given clusters.
- Model selection can be seen as a secondary learning problem where the objective is to find hyper-parameters which help maximizing generalization.
- Supervised learning can benefit from a new representation which corresponds to a mapping of the input to a new feature space.
- A new feature space can be obtained without human intervention by learning representations with an unsupervised algorithm.
- Learning representations with an unsupervised algorithm has several benefits w.r.t. generalization.
- Learning sparse representations may improve generalization if we can assume that inputs can be represented by a limited number of features.
- Learning distributed representations may allow for non-local generalization if each example can be interpreted as combining several features.
- Combining sparse and distributed representations leads to features which are similar to those found in primate brains.
Next: Machine Learning with probabilities