Summary

Course completion
100%
  • From a Bayesian probabilistic perspective, it is natural to update our beliefs with data.
  • It is sometimes necessary to use methods such as rejection sampling, importance sampling, the Metropolis-Hastings algorithm or Gibbs sampling to sample from complex distributions.
  • Probabilistic models can be trained by minimizing the KL-divergence between the empirical data distribution and the model distribution.
  • Equivalently, a probabilistic model can be trained by maximizing the log-likelihood of a dataset under the model.
  • Bayes’ formula gives a method for choosing the best parameters given data: maximum-a-posteriori.
  • The prior distribution gives probabilities to model parameters before having seen a dataset.
  • When the prior is considered uniform, maximum-a-posteriori is equivalent to maximum-likelihood.
  • Probabilistic models can have latent variables which can be understood as unobserved explanatory factors.
  • Models with latent variables can be trained with the EM algorithm which alternates between computing the expected latent variables given the current maximum likelihood estimate, and maximizing the log-likelihood given affectations of the latent variables.
  • Training Gaussian mixtures with EM can be seen as a probabilistic generalization of the K-means clustering algorithm.
  • The log-likelihood gradient in the Euclidean metric is affected by parametrization.
  • The natural gradient based on the Fisher metric is invariant by re-parametrization and can introduce further invariances during optimization.

Next: back to online courses