Given the order statistics X(1)≤...≤X(n), define (n−1) spacings (first order spacings) by
Dk=X(k+1)−X(k),k=1,...,n−1
Intuitively, the spacings should carry some information about the pdf f.
Note that if τ≈nk+1≈nk then X(k+1) and X(k) estimate F−1(τ). If f(F−1(τ)) is large then Dk is small, conversely, f(F−1(τ)) is small then Dk is large.
Consider D1,...,Dn−1 are iid. exponential with E(nDk)=exp(g(Vk)) where Vk=2X(k+1)+X(k), then Vk≈F−1(τ),τ≈nk≈nk+1 and the density is f(x)=exp(−g(x))
Using B-spline functions, we can estimate the function g(x)
g(x)=β0+i=1∑pβjψj(x)
where βi's are unknown parameters and ψj's are B-spline functions.
# create the splines functionsden.splines<-function(x,p=5){library(splines)n<-length(x)x<-sort(x)x1<-c(NA,x)x2<-c(x,NA)sp<-(x2-x1)[2:n]mid<-0.5*(x1+x2)[2:n]y<-n*spxx<-bs(mid,df=p)r<-glm(y~xx,family=quasi(link="log",variance="mu^2"))density<-exp(-r$linear.predictors)r<-list(x=mid,density=density)r}
Consider sampling from GMM model
0.7N(2,1)+0.3N(−2,1)
# randomly sample 500 points from given GMMx<-ifelse(runif(500)<.7,rnorm(500,2,1),rnorm(500,-2,1))# estimate density using p = 8r<-den.splines(x,p=8)# estimationplot(r$x,r$density,type="l",xlab="x",ylab="density",lwd=4,col="red")# actuallines(r$x,0.3*dnorm(r$x,-2,1)+0.7*dnorm(r$x,2,1),lwd=2,lty=2)legend("topleft",c("estimation","actual GMM"),fill=c("red","black"))
Hazard Functions
For X is a positive continuous rv, its hazard function is
h(x)=1−F(x)f(x)
The motivation behind is to consider X as the survival time, consider
Therefore, this represents instantaneous death rate given survival to time x.
Also, note that
h(x)=1−F(x)f(x)=−dxdln(1−F(x))
Therefore,
F(x)=1−exp(−∫0xh(t)dt),f(x)=h(x)exp(−∫0xh(t)dt)
In this case, we require ∫0∞h(x)dx=∞ so that to have a "proper" probability distribution.
The shape of the hazard function gives info not immediately apparent in f or F. h(x) increasing indicates new better than used, decreasing indicates used better than new