Want to find a statistic T(X1,...,Xn) s.t. Eθ[T(X1,...,Xn)]=h(θ) where h has a well-defined inverse. Then, we can set T(X1,...,Xn)=h(θ^) s.t. θ^=h−1(T)
If X1,...,Xn indep. and Eθ(Xi)=h(θ), then by substitution principle, we can estimate Eθ(Xi) by Xˉ and then Xˉ=h(θ^) and so θ^=h−1(Xˉ)
Example: Exponential Distribution
X1,...,Xn indep, f(x;λ)=λexp(−λx),x≥0, λ>0 is unknown.
Note that for r>0,Eλ(Xir)=λ−rΓ(r+1) so that we have MoM estimator
n−1∑nXir=λ^rΓ(r+1)
⇒λ^(r)=((nΓ(r+1)−1∑nXir))−1/r
Using r=1 gives the best estimation (minimized s.d.)
Example: Gamma Distribution
X1,...,Xn indep. f(x;λ,α)=λaxa−1exp(−λx)Γ(a)−1,x≥0. λ,a>0 are unknown. Note that E(Xi)=a/λ,var(Xi)=a/λ2, so that MoM gives
Xˉ=a^/λ^,S2=a^/λ^2
⇒a^=Xˉ2/S2,λ^=Xˉ/S2
Confidence Interval
An interval I=[l(X1,...,Xn),u(X1,...,Xn)] is a CI with coverage 100p% if
P[l(X1,...,Xn)≤θ≤u(X1,...,Xn)]=p,∀θ∈Θ
The pivotal method
Is not that often that we can measure such probability directly. One way to work around is to find a r.v. g(X1,...,Xn,θ) whose distribution is independent of θ and any other unknown params.
Confidence Interval
An interval I=[l(X1,...,Xn),u(X1,...,Xn)] is a CI with coverage 100p% if
P[l(X1,...,Xn)≤θ≤u(X1,...,Xn)]=p,∀θ∈Θ
Example
For X1,...,X20 indep. ∼N(μ,σ2), the 95% CI is [Xˉ±−2.09320S].
The following example is 100 samples of size 20 from N(0,1) and we note that 95% of the samples falls into the confidence interval.
samples=np.random.randn(200,20)mean=samples.mean(axis=1)sd=samples.std(axis=1)not_in_CI=np.concatenate((np.where(mean+2.093*sd/samples.shape[1]**0.5<0)[0],np.where(mean-2.093*sd/samples.shape[1]**0.5>0)[0]))plt.figure(figsize=(12,4))plt.errorbar(x=np.arange(samples.shape[0]),y=mean,yerr=2.093*sd/20**0.5,fmt=" ",label="CI covers true mean")plt.errorbar(x=not_in_CI,y=mean[not_in_CI],yerr=2.093*sd[not_in_CI]/20**0.5,fmt=" ",color="red",label="CI does not cover")plt.axhline(0,linestyle=":",color="grey")plt.xlabel("sample");plt.ylabel("CI")plt.title(r"95% CIs for $\mu$");plt.legend();
The pivotal method
Is not that often that we can measure such probability directly. One way to work around is to find a r.v. g(X1,...,Xn,θ) whose distribution is independent of θ and any other unknown params.
Maximum Likelihood Estimation
Given (X1,...,Xn) r.v. with joint pdf
f(x1,...,xn;θ1,...,θk)
where θ's are unknown parameters. The likelihood is defined as
L(θ1,...,θk)=f(x1,...,xn;θ1,...,θk)
note that x1,...,xn are fixed observations
Suppose that for each x, (T1(x),...,Tk(x)) maximize L(Θ) . Then maximum likelihood estimators (MLEs) of Θ are
θ^j=Tj(X1,...,Xn),j=1,...,k
Existence and uniqueness
MLE is essentially an ad hoc procedure albeit one that works very well in many problems.
MLEs need not be unique, although in most cases, it is unique.
MLEs may not exist, typically when the sample size is too small.
Sufficient Statistic
A statistic T=(T1(X),...,Tm(X)) is sufficient for θ if the conditional distribution of X given T=t depends only on t.
Neyman Factorization Theorem
T is sufficient for θ IFF
f(x;θ)=g(T(x);θ)h(x)
Observed Fisher Information
Given the MLE θ^, the observed Fisher information is
I(θ^)=−dθ2d2lnL(θ^)
Fisher information is an estimator for standard error, i.e.
s^e(θ^)={I(θ^)}−1/2
Mathematically, this is the absolute curvature of the log-likelihood function at its maximum. If this is small, then the estimator is more well-defined (hence with smaller estimated s.e.)
Approximate normality of MLEs
Theorem For X1,...,Xn indep. with pdf f for some real-valued θ∈Θ, if
Θ is an open set
A={x:f(x;θ)>0} does not depend on θ (true for the exponential families)
l(x;θ) is 3-time differentiable w.r.t. θ for each x∈A.