## Logistic regression on binary response or ANOVA on proportions/percentages

I am analyzing survival of seedlings in Styrofoam blocks (known as styroblocks). These blocks have a certain size and contain cavities in which the seedlings are planted. All cavities within a block have the same dimensions (diameter and depth). Different blocks come with different cavity dimensions. I am analyzing whether there is an influence of cavity size on survival. I am using 5 different styroblocks with 5 different cavity volumes and plant seedlings in all cavities. For the analysis, however, I only consider 15 seedlings from the center of each block. Furthermore, I plant 3 different plant varieties and repeat each variety X styroblock combination 3 times. Lastly, this experiment was replicated at 2 different nurseries to test for nursery effects.

Here’s an example dataset:

```
xx <- data.frame(Nursery = rep(c("Nursery A", "Nursery B"), each = 675),
Styroblock = rep(c("Block A", "Block B", "Block C", "Block D", "Block E"), 6, each = 45),
Variety = rep(c("Variety A", "Variety B", "Variety C"), 2, each = 225),
Replicate = rep(c(1,2,3), 30, each = 15), Survival = sample(c(1,0),1350, replace=T, prob=c(.85,.15)))
```

Here’s my first option:

```
fit = glm(Survival ~ Nursery * Variety * Styroblock, data =xx, family=binomial(link="logit"))
summary(fit) ## at this point some model simplification should be performed
library(car)
Anova(fit, type="II", test="Wald")
```

HOWEVER, since cavities within styroblocks are spatially dependent, for my other measurements such as height (not shown here), I took an average across all 15 seedlings to get one measurement per styroblock (which avoids potential pseudo-replication). Given this, I summarized dead/alive (**binary**) by styroblock and clone, resulting in a **proportion** (or percentage) instead. This also reduces my sample size from 1350 to 90. See here:

```
require(plyr)
xx.sum<-ddply(xx, .(Nursery, Styroblock, Variety, Replicate), summarise, Survival = sum(Survival)/15)
```

**Question 1:** Is this the correct way to do?

If yes, I am not sure if I can follow up with ANOVA on the proportions/percentages due to unequal variances:

```
require(ggplot2)
ggplot(xx.sum, aes(x=Styroblock, y=Survival))+geom_boxplot()+facet_grid(Nursery~Variety)
```

Once option would be to transform, which I tried using arcsine and square root transformation but this does not help.

**Question 2**: How would you suggest proceeding in this case, i.e. which test is most appropriate to understand whether my predictors or predictor combinations influence survival?

Thanks!