## Looking for and dealing with collinearity in a GLM

I’ve got this dataset with one continuous dependent variable and two categorical explanatory variables. I’m wanting to run glms on the data but I’m finding problems with what I think is collinearity. When I analyse the data I get a bunch of NAs on the last coefficient. I can’t find a way to test this kind of data for collinearity though, and particularly not for a glm. Although I’m not sure what the best thing would be to do if I did find collinearity!

Please help! This is for my undergrad dissertation and I know practically nothing about R or statistics.

By the way, the dependent variable is proportion data, hence why I am using the binomial family.

```
> summary(Model1data)
Host Parasite Replicate Mortality
1 : 1 Control : 1 A:35 Min. :0.0000
MCB4865:21 Mix :38 B:37 1st Qu.:0.0100
MY10 :21 S.marcescens.2170:42 C:51 Median :0.1500
MY14 :21 S.marcescens.D :42 Mean :0.2055
MY17 :18 3rd Qu.:0.3096
MY8 :21 Max. :0.9885
N2 :20
> glm1 <- glm(data = Model1data,
+ Mortality ~ Host + Parasite, family = binomial)
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
> summary(glm1)
Call:
glm(formula = Mortality ~ Host + Parasite, family = binomial,
data = Model1data)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.86613 -0.19364 -0.08645 0.12684 1.11525
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.801 30.015 -0.227 0.8207
HostMCB4865 2.654 30.069 0.088 0.9297
HostMY10 1.466 30.073 0.049 0.9611
HostMY14 1.184 30.074 0.039 0.9686
HostMY17 2.247 30.071 0.075 0.9404
HostMY8 1.523 30.072 0.051 0.9596
HostN2 1.955 30.071 0.065 0.9482
ParasiteMix 4.118 1.788 2.304 0.0212 *
ParasiteS.marcescens.2170 4.165 1.782 2.337 0.0194 *
ParasiteS.marcescens.D NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 41.499 on 122 degrees of freedom
Residual deviance: 15.220 on 114 degrees of freedom
AIC: 88.973
Number of Fisher Scoring iterations: 9
```

Also this is (I think) the contingency table

```
> table(Model1data$Host, Model1data$Parasite)
Control Mix S.marcescens.2170 S.marcescens.D
1 1 0 0 0
MCB4865 0 7 7 7
MY10 0 7 7 7
MY14 0 7 7 7
MY17 0 4 7 7
MY8 0 7 7 7
N2 0 6 7 7
```