Let us now, as an example, derive the maximum likelihood estimates for a $D$-dimensional Gaussian probability distribution. The goal is then to maximize the log-likelihood, i.e. to find
$$\boldsymbol{\mu}^{*}=\argmax_{\mu}\log p(\mathcal{D}|\boldsymbol{\mu},\boldsymbol{\Sigma})$$
and replacing the model by its definition:
$$\boldsymbol{\mu}^{*}=\argmax_{\mu}\log\prod_{\mathbf{x}\in\mathcal{D}}\frac{1}{(2\pi)^{D/2}\left|\boldsymbol{\Sigma}\right|^{1/2}}\exp\left\{ -\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})\right\} $$
The problem simplifies into
$$\boldsymbol{\mu}^{*}=\argmin_{\mu}\sum_{x\in D}(\mathbf{x}-\boldsymbol{\mu})^{2}$$
As seen in the optimization course, if the problem has a solution, it must satisfy the first order necessary condition, i.e. the first derivative of $\sum_{\mathbf{x}\in D}(\mathbf{x}-\boldsymbol{\mu})^{2}$ with respect to $\boldsymbol{\mu}$ must be $0$ at the optimum $\boldsymbol{\mu}^{*}$:
$$\frac{\partial\log p(\mathcal{D}|\boldsymbol{\mu},\boldsymbol{\Sigma})}{\partial\boldsymbol{\mu}}=\sum_{\mathbf{x}\in D}\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})$$
and therefore
$$\sum_{\mathbf{x}\in D}\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu}^{*})=0$$
which is equivalent to
$$\boldsymbol{\mu}^{*}=\frac{1}{\left|\mathcal{D}\right|}\sum_{\mathbf{x}\in\mathcal{D}}\mathbf{x}$$
Therefore the maximum likelihood estimate for the mean parameter of a $D$-dimensional Gaussian is the arithmetic mean of the training samples.
A similar computation yields the maximum likelihood estimate for the covariance matrix:
$$\boldsymbol{\Sigma}^{*}=\frac{1}{\left|\mathcal{D}\right|}\sum_{\mathbf{x}}(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu}^{T})$$