## Modeling continuous abundance data with a GLM in R: how to select the correct distribution family?

I have abundance data (counts) that I have standardized by area sampled, making them continuous. I would like to explain them with my two independent variables using a GLM but I am having trouble specifying a model distribution. The data are derived from raw counts of salamanders at 40 sites standardized by area of pond sampled. The raw counts are as follows:

```
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 3 3 3 4 5 8 10 11 12 21
```

And after standardizing the counts by area sampled:

```
0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1754590 0.4491828 0.8517423 1.4341965 2.2777698 4.0467065 0.3889454 0.5935273 1.9376223 2.0924642 0.5110034 0.5418544 6.9358962 1.5491324 2.2689315 14.2278592 1.4483645 0.3947695 6.2244910 8.7240609
```

It seems that poisson and negative binomial are now out of the question because my data are not integers. My data has zeros so I don’t think I can use gamma distribution without transforming it.

I used the `fitdist`

function in R (package fitdistrplus) to generate parameters for continuous distributions (exponential and normal). I then randomly sampled true exponential and normal distributions with the generated parameters [`rexp(n,rate)`

and `rnorm(n,mean,sd)`

], respectively. Using a two-sided KS test [`ks.test(x,y)`

], I compared my data with the generated data and the distributions were significantly different (p<<<0.05), ruling out my data being normal or exponential. I transformed the data with an ln+1 transformation (to keep zero values) and they still deviate significantly from the hypothesized distributions.

Because my data has no negative values, contains zeros, and is not right and/or left bound, I’m not sure what distribution to specify for the glm.

My questions are:

1) How can I determine the distribution of my response variable, and if I need to transform, can I use a transformation that turns my zeros into non-zeros or even negative values?

2) If the model family depends on the distribution of errors, how can I know their distribution without first removing the potentially explainable variation? It seems like I would need to explain some variation with my predictor variables, and then model the distribution of the remaining error. Is there a better approach to this?

Any suggestions are helpful, specifically those for R or excel. My knowledge of statistical theory is limited.

Thanks