Basic examples of Maximum Likelihood Estimation

Estimation of best parameter for iid Exponential Distributions

Let $X_1, X_2, \dotsc, X_m$ be a random sample from the exponential distribution with probability density functions of the form $f_\theta(x) = \tfrac{1}{\theta}e^{-x/\theta}$ for $x >0$ and any parameter $\theta >0.$ The likelihood function is then given as the product

$\mathcal{L}(\theta; x_1,\dotsc, x_m) = f_\theta(x_1)\dotsb f_\theta(x_m) = \dfrac{1}{\theta^m} \exp \bigg( -\dfrac{1}{\theta} \displaystyle{\sum_{k=1}^m} x_k \bigg)$

We look for the parameter value $\theta>0$ that offers an absolute maximum of $\mathcal{L}.$ Notice that, since the logarithm is a one-to-one increasing function, the maximum of $\mathcal{L}$ coincides with the maximum of $\log\mathcal{L}.$ The latter expression is easier to handle than the former, so we use this one to look for the extrema in the usual way:

Set $g(\theta) = \log \mathcal{L}(\theta; x_1, \dotsc, x_m) = -m \log(\theta) - \tfrac{1}{\theta} \sum_{k=1}^m x_k;$ it is then $g'(\theta) = -\tfrac{m}{\theta} - \tfrac{1}{\theta^2}\sum_{k=1}^m x_k.$ Note that $g'(\theta) = 0$ if and only if $\theta = \tfrac{1}{m} \sum_{k=1}^m x_k,$ which happens to be positive and actually a maximum of $\mathcal{L}(\theta;x_1,\dotsc,x_m).$

Note that the found parameter $\theta$ is nothing but the arithmetic mean $\bar{x}$ of $\{x_1, \dotsc, x_m\}.$

Estimation of best parameter for iid Geometric Distributions

In this case, the random sample $X_1, \dotsc, X_m$ for the Geometric distribution has probability density functions of the form $f_p(n) = p (1-p)^{n-1}$ for any $n \in \mathbb{N}$ and parameter $p \in [0,1].$ We operate as in the previous example, by looking for extrema of the log-likelihood function:

Set $\mathcal{L}(p;n_1,\dotsc,n_m) = p^m (1-p)^{-m+n_1+\dotsb+n_m}$ for $0 \leq p \leq 1.$
Consider $g(p) = \log \mathcal{L}(p;n_1, \dotsc, n_m) = m\log(p) + \bigg( -m + \displaystyle{\sum_{k=1}^m} n_k \bigg) \log(1-p),$ but only for $0<p<1.$
It is then $g'(p) = \dfrac{m}{p} - \dfrac{1}{1-p}\bigg( -m + \displaystyle{\sum_{k-1}^m} n_k\bigg)$
$g'(p) = 0$ if and only if $p =\dfrac{m}{\sum_{k=1}^m n_k}.$

This time, the solution $p$ coincides with the inverse of the arithmetic mean $\bar{n}$ of the samples $\{n_1, \dotsc, n_m\}$ (which is trivially positive and less than one). It is not hard to prove that this critical point is a maximum, and therefore is the parameter that we are looking for.

Estimation of best parameter for iid Poisson Distributions

The random variables in this case have probability density functions given by $f_\lambda(n) = \dfrac{\lambda^n e^{-\lambda}}{n!}$ for any $n \in \mathbb{N},$ and parameter $\lambda>0.$

Set $\mathcal{L}(\lambda;t_1,\dotsc,t_m) = e^{-m\lambda} \dfrac{\lambda^{n_1+\dotsb+n_m}}{n_1! \dotsb n_m!}.$
Set $g(\lambda) = \log \mathcal{L}(\lambda; n_1, \dotsc, n_m) = -m\lambda -\log (n_1! \dotsb n_m!)+(\log \lambda) \displaystyle{\sum_{k=1}^m} n_k .$
Its derivative is given by $g'(\lambda) = -m + \dfrac{1}{\lambda} \displaystyle{\sum_{k=1}^m} n_k.$
Note that $g'(\lambda) = 0$ only for $\lambda = \dfrac{1}{m} \displaystyle{\sum_{k=1}^m} n_k,$ which is trivially a maximum for $\mathcal{L}.$

As in the case of exponential distributions, the computed parameter $\lambda$ is the arithmetic mean $\bar{n}$ of $\{n_1, \dotsc, n_m\}.$

Estimation of best parameter for iid Normal Distributions

This case is a bit different, since we are dealing with two parameters instead of one: Assume $X_1, X_2, \dotsc, X_m$ is a random sample from the normal distribution with probability density functions of the form $f_{\mu,\sigma}(t) = (2\pi\sigma^2)^{-1/2} \exp \big( - \tfrac{(t-\mu)^2}{2\sigma^2} \big)$ for any $t \in \mathbb{R},$ and parameters $\mu,\sigma \in \mathbb{R}.$ For ease of computations below, and since the parameter $\sigma$ appears always squared on the expression of $f$ , we prefer to work instead with $f_{\mu,\theta}(t) = (2\pi\theta)^{-1/2} \exp \big( - \tfrac{(t-\mu)^2}{2\theta} \big),$ and require the parameter $\theta$ to be non-negative. Note the abuse of notation, and how this does not really affect the final result. We proceed to compute the likelihood function and its logarithm as before:

$\mathcal{L}(\mu,\theta;t_1,\dotsc,t_m) = \bigg( \dfrac{1}{\sqrt{2\pi\theta}} \bigg)^m \exp \bigg( \dfrac{1}{2\theta} \displaystyle{\sum_{k=1}^m} (t_k-\mu)^2 \bigg)$
$g(\mu,\theta) = \log\mathcal{L}(\mu,\theta;t_1,\dotsc,t_m) = -\dfrac{m}{2}\log(2\pi\theta) - \dfrac{1}{2\theta} \displaystyle{\sum_{k=1}^m} (t_k-\mu)^2.$
The partial derivatives of $g$ are given by
$\dfrac{\partial g}{\partial \mu}(\mu,\theta) = \dfrac{1}{\theta} \displaystyle{\sum_{k=1}^m} (t_k - \mu),\qquad \dfrac{\partial g}{\partial \theta}(\mu,\theta) = -\dfrac{m}{2\theta} + \dfrac{1}{2\theta^2} \displaystyle{\sum_{k=1}^m} (t_k-\mu)^2$
Note that $\dfrac{\partial g}{\partial \mu}(\mu,\theta) = 0$ if and only if $\mu = \dfrac{1}{m} \displaystyle{\sum_{k=1}^m} t_k.$ Let us denote it by $\bar{t},$ since it represents the mean of the values $\{ t_1, \dotsc, t_m \}.$
Also, by virtue of the previous statement, a solution for $\dfrac{\partial g}{\partial \theta}(\mu,\theta) = 0$ is given uniquely by $\theta = \dfrac{1}{m} \displaystyle{\sum_{k=1}^m} \big(t_k - \bar{t}\big)^2$ . Note that this value (which is positive, and hence satisfies the constraints) coincides with the variance $s^2$ of the set $\{ t_1, \dotsc, t_m\}.$ It is a priori a valid parameter for $\theta.$
It is not hard to see that the computed critical point $(\mu,\theta) = (\bar{t}, s^2)$ offers indeed an absolute maximum for $\log\mathcal{L}(\mu,\theta;t_1,\dotsc,t_m).$ Indeed, the Hessian of $g$ is given by:

$H(g)(\mu,\theta) =$ $\begin{pmatrix} \tfrac{\partial^2 g}{\partial \mu^2} & \tfrac{\partial^2 g}{\partial \mu \partial \theta} \\ \tfrac{\partial^2 g}{\partial \theta \partial \mu} & \tfrac{\partial^2 g}{\partial \theta^2}\end{pmatrix} \bigg\rvert_{(\mu,\theta)=(\bar{t},s^2)}$

$=$ $\begin{pmatrix} -m/\theta & -\sum_{k=1}^m (t_k-\mu)/\theta^2 \\ -\sum_{k=1}^m (t_k-\mu)/\theta^2 & m/(2\theta^2) - \sum_{k=1}^m (t_k-\mu)^2/\theta^3 \end{pmatrix} \bigg\rvert_{(\mu,\theta)=(\bar{t},s^2)}$

$=$ $\begin{pmatrix} -m/s^2 & 0 \\ 0 & -m/(2s^4) \end{pmatrix}.$

Its determinant at $(\mu,\theta) = (\bar{t},s^2)$ is always positive: $\det H(g)(\bar{t},s^2) = \dfrac{m^2}{2s^6},$ and since $\dfrac{\partial^2 g}{\partial \mu^2}(\bar{t},s^2) = -\dfrac{m}{s^2}$ is always negative, a maximum is attained.

References

Maximum Likelihood Estimation and Inference: With Examples in R, SAS and ADMB (Statistics in Practice)

Comments (0) Trackbacks (0) Leave a comment Trackback

No comments yet.

No trackbacks yet.

Francisco Blanco-Silva