Basic examples of Maximum Likelihood Estimation

Estimation of best parameter for iid Exponential Distributions

Let X_1, X_2, \dotsc, X_m be a random sample from the exponential distribution with probability density functions of the form f_\theta(x) = \tfrac{1}{\theta}e^{-x/\theta} for x >0 and any parameter \theta >0. The likelihood function is then given as the product

\mathcal{L}(\theta; x_1,\dotsc, x_m) = f_\theta(x_1)\dotsb f_\theta(x_m) = \dfrac{1}{\theta^m} \exp \bigg( -\dfrac{1}{\theta} \displaystyle{\sum_{k=1}^m} x_k \bigg)

We look for the parameter value \theta>0 that offers an absolute maximum of \mathcal{L}. Notice that, since the logarithm is a one-to-one increasing function, the maximum of \mathcal{L} coincides with the maximum of \log\mathcal{L}. The latter expression is easier to handle than the former, so we use this one to look for the extrema in the usual way:

Set g(\theta) = \log \mathcal{L}(\theta; x_1, \dotsc, x_m) = -m \log(\theta) - \tfrac{1}{\theta} \sum_{k=1}^m x_k; it is then g'(\theta) = -\tfrac{m}{\theta} - \tfrac{1}{\theta^2}\sum_{k=1}^m x_k. Note that g'(\theta) = 0 if and only if \theta = \tfrac{1}{m} \sum_{k=1}^m x_k, which happens to be positive and actually a maximum of \mathcal{L}(\theta;x_1,\dotsc,x_m).

Note that the found parameter \theta is nothing but the arithmetic mean \bar{x} of \{x_1, \dotsc, x_m\}.

Estimation of best parameter for iid Geometric Distributions

In this case, the random sample X_1, \dotsc, X_m for the Geometric distribution has probability density functions of the form f_p(n) = p (1-p)^{n-1} for any n \in \mathbb{N} and parameter p \in [0,1]. We operate as in the previous example, by looking for extrema of the log-likelihood function:

  • Set \mathcal{L}(p;n_1,\dotsc,n_m) = p^m (1-p)^{-m+n_1+\dotsb+n_m} for 0 \leq p \leq 1.
  • Consider g(p) = \log \mathcal{L}(p;n_1, \dotsc, n_m) = m\log(p) + \bigg( -m + \displaystyle{\sum_{k=1}^m} n_k \bigg) \log(1-p), but only for 0<p<1.
  • It is then g'(p) = \dfrac{m}{p} - \dfrac{1}{1-p}\bigg( -m + \displaystyle{\sum_{k-1}^m} n_k\bigg)
  • g'(p) = 0 if and only if p =\dfrac{m}{\sum_{k=1}^m n_k}.

This time, the solution p coincides with the inverse of the arithmetic mean \bar{n} of the samples \{n_1, \dotsc, n_m\} (which is trivially positive and less than one). It is not hard to prove that this critical point is a maximum, and therefore is the parameter that we are looking for.

Estimation of best parameter for iid Poisson Distributions

The random variables in this case have probability density functions given by f_\lambda(n) = \dfrac{\lambda^n e^{-\lambda}}{n!} for any n \in \mathbb{N}, and parameter \lambda>0.

  • Set \mathcal{L}(\lambda;t_1,\dotsc,t_m) = e^{-m\lambda} \dfrac{\lambda^{n_1+\dotsb+n_m}}{n_1! \dotsb n_m!}.
  • Set g(\lambda) = \log \mathcal{L}(\lambda; n_1, \dotsc, n_m) = -m\lambda -\log (n_1! \dotsb n_m!)+(\log \lambda) \displaystyle{\sum_{k=1}^m} n_k .
  • Its derivative is given by g'(\lambda) = -m + \dfrac{1}{\lambda} \displaystyle{\sum_{k=1}^m} n_k.
  • Note that g'(\lambda) = 0 only for \lambda = \dfrac{1}{m} \displaystyle{\sum_{k=1}^m} n_k, which is trivially a maximum for \mathcal{L}.

As in the case of exponential distributions, the computed parameter \lambda is the arithmetic mean \bar{n} of \{n_1, \dotsc, n_m\}.

Estimation of best parameter for iid Normal Distributions

This case is a bit different, since we are dealing with two parameters instead of one: Assume X_1, X_2, \dotsc, X_m is a random sample from the normal distribution with probability density functions of the form f_{\mu,\sigma}(t) = (2\pi\sigma^2)^{-1/2} \exp \big( - \tfrac{(t-\mu)^2}{2\sigma^2} \big) for any t \in \mathbb{R}, and parameters \mu,\sigma \in \mathbb{R}. For ease of computations below, and since the parameter \sigma appears always squared on the expression of f, we prefer to work instead with f_{\mu,\theta}(t) = (2\pi\theta)^{-1/2} \exp \big( - \tfrac{(t-\mu)^2}{2\theta} \big), and require the parameter \theta to be non-negative. Note the abuse of notation, and how this does not really affect the final result. We proceed to compute the likelihood function and its logarithm as before:

  • \mathcal{L}(\mu,\theta;t_1,\dotsc,t_m) = \bigg( \dfrac{1}{\sqrt{2\pi\theta}} \bigg)^m \exp \bigg(  \dfrac{1}{2\theta} \displaystyle{\sum_{k=1}^m} (t_k-\mu)^2 \bigg)
  • g(\mu,\theta) = \log\mathcal{L}(\mu,\theta;t_1,\dotsc,t_m) = -\dfrac{m}{2}\log(2\pi\theta) - \dfrac{1}{2\theta} \displaystyle{\sum_{k=1}^m} (t_k-\mu)^2.
  • The partial derivatives of g are given by

    \dfrac{\partial g}{\partial \mu}(\mu,\theta) = \dfrac{1}{\theta} \displaystyle{\sum_{k=1}^m} (t_k - \mu),\qquad \dfrac{\partial g}{\partial \theta}(\mu,\theta) = -\dfrac{m}{2\theta} + \dfrac{1}{2\theta^2} \displaystyle{\sum_{k=1}^m} (t_k-\mu)^2

  • Note that \dfrac{\partial g}{\partial \mu}(\mu,\theta) = 0 if and only if \mu = \dfrac{1}{m} \displaystyle{\sum_{k=1}^m} t_k. Let us denote it by \bar{t}, since it represents the mean of the values \{ t_1, \dotsc, t_m \}.
  • Also, by virtue of the previous statement, a solution for \dfrac{\partial g}{\partial \theta}(\mu,\theta) = 0 is given uniquely by \theta = \dfrac{1}{m} \displaystyle{\sum_{k=1}^m} \big(t_k - \bar{t}\big)^2. Note that this value (which is positive, and hence satisfies the constraints) coincides with the variance s^2 of the set \{ t_1, \dotsc, t_m\}. It is a priori a valid parameter for \theta.
  • It is not hard to see that the computed critical point (\mu,\theta) = (\bar{t}, s^2) offers indeed an absolute maximum for \log\mathcal{L}(\mu,\theta;t_1,\dotsc,t_m). Indeed, the Hessian of g is given by:
    H(g)(\mu,\theta) = \begin{pmatrix} \tfrac{\partial^2 g}{\partial \mu^2} & \tfrac{\partial^2 g}{\partial \mu \partial \theta} \\ \tfrac{\partial^2 g}{\partial \theta \partial \mu} & \tfrac{\partial^2 g}{\partial \theta^2}\end{pmatrix} \bigg\rvert_{(\mu,\theta)=(\bar{t},s^2)}
    = \begin{pmatrix} -m/\theta & -\sum_{k=1}^m (t_k-\mu)/\theta^2 \\ -\sum_{k=1}^m (t_k-\mu)/\theta^2 & m/(2\theta^2) - \sum_{k=1}^m (t_k-\mu)^2/\theta^3 \end{pmatrix} \bigg\rvert_{(\mu,\theta)=(\bar{t},s^2)}
    = \begin{pmatrix} -m/s^2 & 0 \\ 0 & -m/(2s^4) \end{pmatrix}.

    Its determinant at (\mu,\theta) = (\bar{t},s^2) is always positive: \det H(g)(\bar{t},s^2) = \dfrac{m^2}{2s^6}, and since \dfrac{\partial^2 g}{\partial \mu^2}(\bar{t},s^2) = -\dfrac{m}{s^2} is always negative, a maximum is attained.

References


Maximum Likelihood Estimation and Inference: With Examples in R, SAS and ADMB (Statistics in Practice)

  1. No comments yet.
  1. No trackbacks yet.

Leave a comment