Summary

Course completion

100%

From a Bayesian probabilistic perspective, it is natural to update our beliefs with data.
It is sometimes necessary to use methods such as rejection sampling, importance sampling, the Metropolis-Hastings algorithm or Gibbs sampling to sample from complex distributions.
Probabilistic models can be trained by minimizing the KL-divergence between the empirical data distribution and the model distribution.
Equivalently, a probabilistic model can be trained by maximizing the log-likelihood of a dataset under the model.
Bayes’ formula gives a method for choosing the best parameters given data: maximum-a-posteriori.
The prior distribution gives probabilities to model parameters before having seen a dataset.
When the prior is considered uniform, maximum-a-posteriori is equivalent to maximum-likelihood.
Probabilistic models can have latent variables which can be understood as unobserved explanatory factors.
Models with latent variables can be trained with the EM algorithm which alternates between computing the expected latent variables given the current maximum likelihood estimate, and maximizing the log-likelihood given affectations of the latent variables.
Training Gaussian mixtures with EM can be seen as a probabilistic generalization of the K-means clustering algorithm.
The log-likelihood gradient in the Euclidean metric is affected by parametrization.
The natural gradient based on the Fisher metric is invariant by re-parametrization and can introduce further invariances during optimization.

Next: back to online courses

Summary

Share this:

Available translations