Log-linear Regression (Poisson Regression)

Case Study: Mating Success of Elephants

Predictor: Age: age at beginning in years (range from 27-52)
Outcome: Matings: #successful matings

Question: - What's the relationship between mating success and age? - Do males have diminished success after reaching some optimal age?

# load data
library(Sleuth3)
elmasu = case2201
Age = elmasu$Age
Matings = elmasu$Matings

plot(Age, Matings)

Why not Linear Regression

outcome is counts and small numbers
won't have a normal distribution conditional on age

Poisson distribution

Useful for counts of rare events

Poisson link function: $g(\mu)=log(\mu)$, also called a log-linear model

Interpretation of $\beta$'s: Increase $x_j$ by one unit, and holding other predictors constant, $\mu_j$ changes by a factor of $e^{\beta_j}$

Then, $\mu = e^{X\beta}$

Estimation method: MLE by IRLS algorithm

Inference: Wald procedures and likelihood ratio test (as in logistic regression)

Checking Model Adequacy

(similar to binomial logistic regression)

Linear in $\beta$'s: Plot $\log(y_i)$ vs. $x$'s to see if linear relationship is appropriate. Jitter if many $y_i = 0$ (by using $\log(y_i+k),k$ is a small positive value)
Outliers: Deviance and Pearson residuals
Correct form: Wald $(\hat\beta_j/se(\hat\beta_j)^2\sim \chi^2_{p+1})$ and LRT tests
Adequate: Deviance GOF test

Common problem: $var(Y_i) > E(Y_i)$
Solution: add an extra dispersion

Model log likelihood

\[\mathcal{L} = \prod_1^n P(Y_i=y_i)=\prod_1^n \frac{e^{-\mu_i}\mu_i^{y_i}}{y_i!}\]

\[\log\mathcal{L}=\sum_1^n (-\mu_i + y_i\log(\mu_i)-\log(y_i!))\]

Since $\log(y_i)$ is a constant for $\mu_i$, we can drop it when maximizing likelihood.

fitllm = glm(Matings~Age, family=poisson)
summary(fitllm)

Call:
glm(formula = Matings ~ Age, family = poisson)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.80798  -0.86137  -0.08629   0.60087   2.17777

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.58201    0.54462  -2.905  0.00368 ** 
Age          0.06869    0.01375   4.997 5.81e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 75.372  on 40  degrees of freedom
Residual deviance: 51.012  on 39  degrees of freedom
AIC: 156.46

Number of Fisher Scoring iterations: 5

plot(Age, log(Matings + 1))
abline(fitllm$coefficients[1], fitllm$coefficients[2])

fitllm2 = glm(Matings~Age+I(Age^2), family=poisson)
summary(fitllm2)

Call:
glm(formula = Matings ~ Age + I(Age^2), family = poisson)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8470  -0.8848  -0.1122   0.6580   2.1134

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.8574060  3.0356383  -0.941    0.347
Age          0.1359544  0.1580095   0.860    0.390
I(Age^2)    -0.0008595  0.0020124  -0.427    0.669

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 75.372  on 40  degrees of freedom
Residual deviance: 50.826  on 38  degrees of freedom
AIC: 158.27

Number of Fisher Scoring iterations: 5

print(AIC(fitllm))
print(BIC(fitllm))
print(AIC(fitllm2))
print(BIC(fitllm2))

[1] 156.4578
[1] 159.8849
[1] 158.2723
[1] 163.4131

yhats = predict.glm(fitllm, type="response")
rres = residuals(fitllm, type="response")
pres = residuals(fitllm, type="pearson")
dres = residuals(fitllm, type="deviance")
cbind(Matings, yhats, rres)

A matrix: 41 × 3 of type dbl
	Matings	yhats	rres
1	0	1.313503	-1.31350336
2	1	1.406903	-0.40690281
3	1	1.406903	-0.40690281
4	1	1.406903	-0.40690281
5	3	1.406903	1.59309719
6	0	1.506944	-1.50694362
7	0	1.506944	-1.50694362
8	0	1.506944	-1.50694362
9	2	1.506944	0.49305638
10	2	1.506944	0.49305638
11	2	1.506944	0.49305638
12	1	1.614098	-0.61409805
13	2	1.851807	0.14819297
14	4	1.983484	2.01651630
15	3	1.983484	1.01651630
16	3	1.983484	1.01651630
17	3	1.983484	1.01651630
18	2	1.983484	0.01651630
19	1	2.124524	-1.12452352
20	1	2.124524	-1.12452352
21	2	2.124524	-0.12452352
22	3	2.124524	0.87547648
23	5	2.437403	2.56259689
24	6	2.437403	3.56259689
25	1	2.610720	-1.61071983
26	1	2.610720	-1.61071983
27	6	2.610720	3.38928017
28	2	2.796361	-0.79636061
29	1	2.995202	-1.99520177
30	3	3.436307	-0.43630655
31	4	3.680652	0.31934757
32	0	3.942373	-3.94237304
33	2	3.942373	-1.94237304
34	3	3.942373	-0.94237304
35	4	3.942373	0.05762696
36	9	3.942373	5.05762696
37	3	4.222704	-1.22270385
38	5	4.522968	0.47703182
39	7	5.189068	1.81093216
40	2	5.558048	-3.55804753
41	9	7.315666	1.68433367

par(mfrow=c(1,2))
plot(Age, pres, main="Pearson Residuals")
plot(Age, dres, main="Deviance Residuals")

psihat = sum(pres^2 / fitllm$df.residual)
summary(fitllm, dispersion=psihat)

Call:
glm(formula = Matings ~ Age, family = poisson)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.80798  -0.86137  -0.08629   0.60087   2.17777

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.58201    0.58590  -2.700  0.00693 ** 
Age          0.06869    0.01479   4.645  3.4e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1.157334)

    Null deviance: 75.372  on 40  degrees of freedom
Residual deviance: 51.012  on 39  degrees of freedom
AIC: 156.46

Number of Fisher Scoring iterations: 5

fitlm = lm(Matings~Age)
summary(fitlm)

Call:
lm(formula = Matings ~ Age)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.1158 -1.3087 -0.1082  0.8892  4.8842

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -4.50589    1.61899  -2.783  0.00826 ** 
Age          0.20050    0.04443   4.513 5.75e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.849 on 39 degrees of freedom
Multiple R-squared:  0.343, Adjusted R-squared:  0.3262 
F-statistic: 20.36 on 1 and 39 DF,  p-value: 5.749e-05

fitlml = lm(log(Matings+1)~Age)
summary(fitlml)

Call:
lm(formula = log(Matings + 1) ~ Age)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.49087 -0.33939  0.06607  0.35376  0.81171

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.69893    0.45861  -1.524 0.135567    
Age          0.05093    0.01259   4.046 0.000238 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5237 on 39 degrees of freedom
Multiple R-squared:  0.2957,    Adjusted R-squared:  0.2776 
F-statistic: 16.37 on 1 and 39 DF,  p-value: 0.0002385

par(mfrow=c(3,2))
plot(fitllm, which=1:2, main="Poisson")
plot(fitlm, which=1:2, main="SLR")
plot(fitlml, which=1:2, main="log trans")

Deviance GOF Test

# test statistic
fitllm$deviance

51.0116278635223

# p-value
1 - pchisq(51.012, 39)

0.0942563842711481

Question: determine whether the fitted model fits as well as the saturated model

Hypotheses: $H_0:$ fitted model fits as well as saturated model
$H_a:$ Saturated model fits better

Test Statistic: $51.012\sim \chi^2_{41-2}$

p-value: 0.094

Conclusion: weak evidence that fitted model is adequate.

Wald or LRT

summary(fitllm)

Call:
glm(formula = Matings ~ Age, family = poisson)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.80798  -0.86137  -0.08629   0.60087   2.17777

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.58201    0.54462  -2.905  0.00368 ** 
Age          0.06869    0.01375   4.997 5.81e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 75.372  on 40  degrees of freedom
Residual deviance: 51.012  on 39  degrees of freedom
AIC: 156.46

Number of Fisher Scoring iterations: 5

summary(fitllm2)

Call:
glm(formula = Matings ~ Age + I(Age^2), family = poisson)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8470  -0.8848  -0.1122   0.6580   2.1134

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.8574060  3.0356383  -0.941    0.347
Age          0.1359544  0.1580095   0.860    0.390
I(Age^2)    -0.0008595  0.0020124  -0.427    0.669

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 75.372  on 40  degrees of freedom
Residual deviance: 50.826  on 38  degrees of freedom
AIC: 158.27

Number of Fisher Scoring iterations: 5

Question: determine whether the mean number of successful matings tends to peak at some age and then decrease or whether the mean continues to increase with age

Wald
Hypotheses: $H_0: \beta_2 = 0, H_a:\beta_3\neq 0$

Test Statistic: $-0.427\sim N(0,1)$

p-value: $0.669$

LRT
Hypothesis: $H_0: $ the model without higher order term is better. $H_a: $ the model with higher order term is better.

Test statistic: $51.012 - 50.826 = 0.186\sim \chi^2_1$

p-value: $0.667$

Conclusion: There is strong evidence that the quadratic term does not contribute to the model. The data provides no evidence that the mean number of successful matings peak at some age

Practice Example

library(Sleuth3)
elmasu = case2201
Age = elmasu$Age
Matings = elmasu$Matings

fitllm = glm(Matings~Age, family=poisson)
summary(fitllm)

Call:
glm(formula = Matings ~ Age, family = poisson)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.80798  -0.86137  -0.08629   0.60087   2.17777

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.58201    0.54462  -2.905  0.00368 ** 
Age          0.06869    0.01375   4.997 5.81e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 75.372  on 40  degrees of freedom
Residual deviance: 51.012  on 39  degrees of freedom
AIC: 156.46

Number of Fisher Scoring iterations: 5

fitho = glm(Matings~Age + I(Age^2), family=poisson)
summary(fitho)

Call:
glm(formula = Matings ~ Age + I(Age^2), family = poisson)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8470  -0.8848  -0.1122   0.6580   2.1134

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.8574060  3.0356383  -0.941    0.347
Age          0.1359544  0.1580095   0.860    0.390
I(Age^2)    -0.0008595  0.0020124  -0.427    0.669

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 75.372  on 40  degrees of freedom
Residual deviance: 50.826  on 38  degrees of freedom
AIC: 158.27

Number of Fisher Scoring iterations: 5

Part (a)

Consider the elephant mating example from lecture.

Both the binomial and the Poisson distributions provide probability models for counts. Is the binomial distribution appropriate for the number of successful matings of the male African elephants?
No, the binomial distribution has a upper bound for the number of success. (Number of success in a definite number of trials).
For the model fit in lecture, interpret the coefficient of age.
Keeping other explanatory variables constant, one year increase in age will increase the number of matings by $e^{\beta_1}-1 = e^{0.0687}-1=0.07$
Consider the plot of the number of matings versus age. The spread of the responses is larger for larger values of the mean response. Should we be concerned?
No, the model assumes the underlying distribution is Poisson, where $E(X)=var(X)=\mu$. Therefore, the spread will increase as the increase of of the mean response
From the estimated log-linear regression of the elephants' successful matings on age, what are the mean and variance of counts of successful matings (in the 8 years of the study) for the elephants who are aged 25 years at the beginning of the observation period? What are the mean and variance for elephants who are aged 45 years? For 25 years old: mean = variance = $e^{-1.582+(0.0687)25}=1.15$ For 25 years old: mean = variance = $e^{-1.582+(0.0687)45}=4.53$
While it is hypothesized that the number of matings increases with age, there may be an optimal age for matings where, for older elephants the number of matings starts to decline. One way to investigate this is to add a quadratic term for age into the model to allow the log of the mean number of matings to reach a peak. Does the inclusion in the model of $age^2$ improve the fit?
By Wald test, since $p=0.669$, we cannot reject the null hypothesis, there is no evidence that the higher order term improve the fit

Part (b)

What is the difference between a log-linear model and a linear model after the log transformation of the response?

For log-linear model, mean is $\mu$, the model is $\log(\mu)=X\beta$.

For SLR with transformation, the model is expressed in terms of the mean of the log of $Y$ with different model assumptions. - different variance - different test procedures and diagnosis

Part (c)

Why are ordinary residuals $y_i-\hat\mu_i$ not particularly useful for Poisson regression?

Because the variance in Poisson model is not constant. The residuals with larger means will have larger variances. So if an observation has a large residual it is difficult to know whether it is an outlier or an observation from a distribution with larger variance than the others. Residuals that are studentized so that they have the same variance are more useful for identifying outliers.

Part (d)

Consider the deviance goodness-of-t test.

Under what conditions is it valid for Poisson regression?
The sample size for each group is large enough.
When it is valid, what possibilities are suggested by a small p-value? The Poisson distribution is an inadequate model, the explanatory variables are inadequate, or there are some outliers.
Large p-value?
Model is correct, or there is insufficient data to detect any inadequacies.

Because the underlying distribution is Poission. The likelihood function is

\[\prod_1^n \frac{\mu_i^{y_i}e^{-\mu_i}}{y_i!}=\frac{e^{-\sum \mu_i}\prod \mu_i^{y_i}}{\prod y_i!}\]

Take log

\[=-\sum \mu_i + \sum y_i \log(mu_i) - \sum \log(y_i!)\]

Where $\mu_i = e^{X_{\cdot,i}\beta}$ (X: the ith observation)