I’m trying to fit R glm’s to data sets where the response is zero inflated positive continuous. This is an example data set
df <- structure(list(y=c(0,0,0,0,0,0,0,0,4.09417142857143e-05,0,0,0,0,2.18718787234043e-05,0,0,0,0,0,0,0,0,0,0,0,0,0,4.60192985143918e-05,0,1.16500081901182e-05,0,1.13497040795196e-05,0,2.01959645093946e-05,0,0,2.24704801120448e-05,3.50195529889299e-05,2.26178388643899e-06,0,0,0,0,0,0,1.24748435005701e-05,0,0,0,1.14008923269261e-06,0,0,0,0,0,0,0,1.95445414576182e-05,0,0,0,0,0,0,0,0,1.72293987591653e-05,1.11122316638726e-05,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),behavior = c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,3,0,3,0,3,0,3,0,3,0,3,0,3,0,3,0,3,0,6,0,6,0,6,0,6,0,2,0,2,0,2,0,2,0,2,0,2,0,2,0,2,0,2,0,2,0,2,0,4,0,4,0,4,0,4,0,4,0,4,0,4,0,4),time = c(272,148,132,854,234,340,90,282,840,230,118,620,130,1410,1272,684,6624,1220,250,4556,4500,2254,662,334,336,1364,346,4308,372,7448,1242,9658,926,6706,494,1996,3570,13550,62134,6876,3108,4722,794,2394,21502,21048,5442,2748,920,25596,1954,7896,996,12108,2046,23034,3216,7574,9882,5560,5610,4524,6846,3450,11630,3528,3546,9544,4820,4852,13508,3726,3744,1252,1254,2514,3786,2534,14080,3882,6510,10520,72928,10052)),.Names = c("y","behavior","time"),row.names = c(NA, 84),class = "data.frame")
behavior to be a factor
df$behavior = as.factor(df$behavior)
The response is
df$time are the fixed effects (categorical and continuous, respectively).
A glimpse at the distribution of the response:
I tried using the R
tweedie package with
glm) with this code:
tweedie.fit <- tweedie.profile(y ~ behavior + time, do.plot=TRUE, data = df)
model.fit <- glm2(y ~ behavior + time, family=tweedie(var.power=out$xi.max, link.power=1-out$xi.max), data = df)
which either does not converge (in case of other data sets) or shows poor qq-plot fit:
Is there any hope to reliably fit a glm to such data? If so what am I doing wrong?
I also tried using the R
gamlss package with a log normal family where I add
.Machine$double.eps to df$y:
df$y <- df$y + .Machine$double.eps
model.fit <- gamlss(y ~ behavior + time, data = df, family = LOGNO())
but it doesn’t seem to do much better:
Help would be greatly appreciated.