Bayes Inference
The MoM of MLE approach cannot incorporate other information about \(\theta\) into the inference. For example, information from previous studies or "common sense" on the parameters.
Prior and Posterior
Quantify a priori information about \(\theta\) via a distribution. We can think of \(f(x_1...x_n; \theta)\) as representing the conditional pdf of \(X_1,...,X_n\) given \(\theta\), where \(\theta \sim \Theta\) so that
Such information of \(\theta\) is given via a prior \(\pi(\theta)\). Then, the posterior density function combines the information from the prior with the information from the data. By Bayes Theorem
The denominator, \(c(x_1,...,x_n)\) is often called the normalizer, which depends only on the data and in practice, intractable to compute.
Bayesian Interval Estimation
Given a posterior \(\pi(\theta|x)\), an interval \(\mathcal I(x)\) is \(100 p\%\) credible interval for \(\theta\) if
A \(100p\%\) credible interval is called a \(100p\%\) highest posterior density interval for \(\theta\) if
This measurement is in real time, very close to CI
Maximum a posteriori (MAP) estimate
\(\hat\theta\) is the posterior mode
MAP are often used in situation where MLE is unstable or undefined. The prior density is used to "regularize" the problem, i.e. force the distribution of \(\hat \theta\) to stay within a bounded subset of \(\Theta\). so that we reduce the variance of \(\hat\theta\), in exchange of possibly increasing the bias.
Example: Non-regular location estimation
Let \(X_1,...,X_n\) indep. r.v. with
The likelihood function is
MLE is undefined since \(\lim_{\theta\rightarrow x_i}\mathcal L(\theta) = \infty\)
Consider a Cauchy prior \(\pi(\theta) = \frac{10}{\pi(100+\theta^2)}\) so that the posterior is
so that we can compute the pdf
Multiparameter Model Bayesian Inference
For the interested parameter, we can determine the posterior density of the subset of interested parameters by integrating over the others, i.e.
Example: Two state Markov Chain
Consider a (first order) Markov Chain \(\{X_i\}\) where each \(X_i \in \{0, 1\}\), by Markov assumption,
Then, we parameterize the transition (conditional) probabilities as
\(X_{i} = 0\) | \(X_{i} = 1\) | |
---|---|---|
\(X_{i+1} = 0\) | \(1-a\) | \(a\) |
\(X_{i+1} = 1\) | \(\beta\) | \(1-\beta\) |
Let the table above be our transition matrix \(P\), note that \(P^k\) is used to determine \(P(X_{i+k} = x | X_i = y)\), also note that the eigenvalues are \(1, \rho = 1-a-\beta\) for \(P\), which gives \(1, \rho^k\) for \(P^k\).
We can obtain the stationary distribution of the Markov Chain by looking at
Therefore, the stationary distribution of \(X_i\) is
So that
Then, we define a run as the number of subsequence consisting of all \(0\) or \(1\), for example
gives 7 runs.
If \(X_1,...,X_n\) come from the two state MC, then the number of runs is
\(R\) provide some information about \(\rho\), i.e. \(\rho\rightarrow 1\Rightarrow R\downarrow, \rho\rightarrow -1\Rightarrow R\uparrow\)
MoM estimator
Note that
We can use the proportion of 1s and number of runs to obtain the MoM estimator of \(\theta\) and \(\rho\)
MLE (MCLE)
Note that
We can also define a conditional likelihood function
Maximizing \(\mathcal L\) w.r.t. to \(a,\beta\) individually is impossible, while we can maximize \(\mathcal L_{cond}\), which gives
Bayesian Analysis
Consider a prior \(\pi(a,\beta)\), then the prior density for \((\theta, \rho)\) is
on the set \(\{(\theta, \rho) : \rho \in (-1, 1), \theta\in \frac{-\min(\rho, 0)}{1-\min(\rho, 0)}, \frac{1}{1-\min(\rho, 0)}\}\)
However, it's hard for Bayesian to compute the normalizer.
Markov Chain Monte Carlo (MCMC)
To draw samples \((a_1,\beta_1), ..., (a_N, \beta_N)\) from the posterior density, gives \(\rho_1,...,\rho_N\) from posterior for \(\rho\) so that we can estimate the posterior on \(\rho\) using kernel density estimation.