GLM with multiple imputation or mixed model


I have a data set with repeated measures with two treatment groups where each subject is measured at 3 time points.But the data set includes missing data. In SPSS if I use general linear model it ignores the rows with missing data therefore I am left with very few data. As this is a issue I read that Mixed Model approach can handle this missing data issue better. But can I use General Linear Model with multiple imputation instead of using General Linear Model?
Because General Linear Model allows me to do plots for the two groups over time and it also shows the interactioj between time and groups(time*groups) which I don’t know how to do in mixed model

Write code and explain some term in Logistic regression analysis by R


We want to do the logistic regression analysis to consider the effect of Age, CD4 on drug resistance mutations. The code that we wrote is:

logist.summary(glm(DRM ~ Age, data = Database, family = binomial),"wald")

The results are:

            log.OR OR lower.CI upper.CI p.value
(Intercept)  -0.31 0.74     0.05     9.95  0.8169
Age          -0.07 0.93     0.86     1.00  0.0525

However, we want to do the test like, we will consider whether, 20 years old differences between the subjects, what the results is? Is it relative to DRMs? We wrote:

logist.summary(glm(DRM ~ I(Age+20), data = Database, family = binomial),"wald")

Results:

            log.OR   OR lower.CI upper.CI p.value
(Intercept)   1.17 3.22     0.05   190.62  0.5742
I(Age + 20)  -0.07 0.93     0.86     1.00  0.0525

I want to ask:

  • Is the code we wrote correct?
  • Can you help me explain what is meaning of these table?
  • Why it is the same results for the Age and Age+20? But differences in the Intercept? What does intercept meaning in this case?

Pseudoreplications and the methodes used to explore the correlations


In my experiment I have measured growth of different trees on predefined circular plots (x, y, z, a). On each plot all trees were measured. For each location I have one treatment information.

statistical design

Now I would like to explore the dependence of growth of selected tree species on the different treatments. If I understand correctly Hurlbert (1984) I have here problems with simple pseudoreplication so the use of general linear model is not reasonable?

My question is should I use a mixed-effects model which includes both fixed and random effects or should I use Generalized Linear Model or Bayesian Generalized linear model? I would really appreciated any suggestions on some research where similar problems were discussed.

What are good resources to learn about GLM? [duplicate]


This question already has an answer here:

Using the linear equation with log transformed data


If I have log transformed axes and then produce a nice linear regression model. How do I use the equation of the line? i.e. Can I use my raw data $x$ values to predict real values for $y$? Is the constant $a$ real or do I have to use the $exp(a)$ to get the real value for $a$?

Is glm(A~B*C*D) the same as glm(A~C*B*D)?


When I run these two in R, I get different values. I thought that I should get the same values since it just includes their interaction terms.

Finding a reaction norm in R using logistic regression with binomial errors.


I am trying to calculate ‘reaction norms’ for a fish species. This is essentially the length at which the probability that a fish become mature equals 50% for a particular age class.

I know I have to use a logistic regression model with binomial errors but I can’t work out how to calculate this from the summary outputs or plot the regression successfully!

I have a data set that has:
‘age’ classes in (1,2,3,4,5,6),’Lngth’ data in mm and ‘Maturity’ data (Immature/Mature – 0/1).

I am running a glm as follows

Model<-glm(Maturity~Lgnth, family=binomial(logit)) 

This however does not take into account the different age classes (I would really like to avoid creating whole new data sets for each age classes as I have multiple year ranges to test).

And even so, I do not understand how I interpret the summary output to give me a length at which the probability of being mature equals 50%, along with the standard errors of this figure.

I also can’t quite get the code right to plot this.
Ideally id have one plot with lngth along the x axis, probability along the y and six lines/curves representing each age classes.

I would really appreciate any help any one could provide! I know this can all be achieved but I am really struggling.

Cheers

Discordant significance of OR’s confidence interval and p-value in glm(quasibinomial) model [R]


I’m currently trying to test if differences in proportions of people infected by malaria (RDT positive) between clusters with high or low coverage of control intervention are significant. Therefore I’m running a quasi-binomial GLM in R like this:

fit <- glm(cbind(RDT_pos, RDT_neg)~ Coverage_ov75 + rur_urb + pattern, data=df, quasibinomial) 

My explanatory variable of interest is Coverage_ov75; the others are controlling variables. Everything is ok until I test the significance. When I use summary(fit), p-value is 0.0630, so the result is non significant at .05 level:

Call:
glm(formula = cbind(RDT_pos, RDT_neg) ~ Coverage_ov75 + rur_urb + 
pattern, family = quasibinomial, data = df)

Deviance Residuals: 
Min       1Q   Median       3Q      Max  
-3.7111  -1.9594  -0.7794   1.0525   3.8943  

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -5.1581     0.7147  -7.217 7.48e-07 ***
Coverage_ov751    -1.4263     0.7224  -1.975   0.0630 .  
rur_urb1           0.3906     0.5999   0.651   0.5227    
patternHighlands   1.3138     0.7581   1.733   0.0993 .  
patternSouth       1.6602     0.7000   2.372   0.0284 *  
patternWest        1.3084     0.7271   1.799   0.0878 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 5.131962)

Null deviance: 157.92  on 24  degrees of freedom
Residual deviance: 103.30  on 19  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5

But when I use odds ratios’ 95% confidence intervals, the result becomes significant –i.e. 95%CI excludes 1.00 value-, whether I use confint or confint.default function:

> exp(cbind(coef(fit), confint(fit)))  
Waiting for profiling to be done...
                                  2.5 %      97.5 %
(Intercept)      0.005752403 0.00115120  0.01973567
Coverage_ov751   0.240198126 0.04344154  0.83777344
rur_urb1         1.477932315 0.47495969  5.29358843
patternHighlands 3.720230839 0.83163497 18.19418962
patternSouth     5.260253952 1.37407939 23.72815126
patternWest      3.700344766 0.91283718 17.44061487
> exp(cbind(coef(fit), confint.default(fit)))  
                                    2.5 %      97.5 %
(Intercept)      0.005752403 0.001417273  0.02334774
Coverage_ov751   0.240198126 0.058304544  0.98954790
rur_urb1         1.477932315 0.456059336  4.78947312
patternHighlands 3.720230839 0.841872843 16.43967686
patternSouth     5.260253952 1.334015707 20.74208834
patternWest      3.700344766 0.889853675 15.38741905

The result from confint.default is very close to significance level but the p-value is not so close to .05, at least not as much as what I previously observed in discrepant results (or as others describe e.g. in Differences between conclusions from a p-value and confidence intervals) .

I’m not used to quasi-binomial models, so I wonder if my procedure correct. If it’s correct, than how should I interpret these discrepant results?
Many thanks in advance.

So many significant explanatory variables and so small auc


Have you ever seen a model with almost every significant variable and such small auc (area under the ROC curve) ? What might be the cause of it? When I saw summary of a model I thought this model will have a great performance, but when I make predictions for observations from the test set it appears that the auc is very small, which means that the prediction is very poor.

Can anyone suggests any explanation of such situation and maybe give some advices on how to make an impact on model performance in terms of making auc higher?

My model summary

Call:
glm(formula = cliks012 ~ hours + primaryhardwaretype + browsername + 
    osname + age_group_troll + plec, family = "binomial", data = train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.5917  -0.1958  -0.1900  -0.1851   3.4862  

Coefficients:
                                Estimate Std. Error z value Pr(>|z|)    
(Intercept)                    -6.170726   0.127305 -48.472  < 2e-16 ***
hoursEvening                   -0.016045   0.011402  -1.407 0.159380    
hoursMorning                    0.063541   0.011190   5.678 1.36e-08 ***
hoursNight                      0.122892   0.022960   5.352 8.68e-08 ***
primaryhardwaretypeMobilePhone  2.050345   0.089335  22.951  < 2e-16 ***
primaryhardwaretypeOther        0.213355   0.215761   0.989 0.322738    
primaryhardwaretypeTablet       2.066480   0.124671  16.575  < 2e-16 ***
browsernameChrome               0.032222   0.099976   0.322 0.747226    
browsernameFirefox              0.111669   0.099724   1.120 0.262805    
browsernameInternetExplorer     0.057690   0.100636   0.573 0.566473    
browsernameOther                0.187177   0.101820   1.838 0.066017 .  
browsernameSafari               0.042307   0.111451   0.380 0.704241    
osnameiOS                      -0.003356   0.109021  -0.031 0.975444    
osnameLinux                     2.132874   0.144926  14.717  < 2e-16 ***
osnameLinuxUbuntu               2.354108   0.147236  15.989  < 2e-16 ***
osnameOSX                       2.351537   0.138428  16.987  < 2e-16 ***
osnameOther                     2.110826   0.107162  19.698  < 2e-16 ***
osnameWindows7                  2.168022   0.132195  16.400  < 2e-16 ***
osnameWindows8                  2.103507   0.135373  15.539  < 2e-16 ***
osnameWindows81                 2.197786   0.132560  16.579  < 2e-16 ***
osnameWindowsVista              2.136239   0.133530  15.998  < 2e-16 ***
osnameWindowsXP                 2.121562   0.132499  16.012  < 2e-16 ***
age_group_trollOldTroll         0.008260   0.052773   0.157 0.875619    
age_group_trollWorker          -0.097919   0.015123  -6.475 9.49e-11 ***
age_group_trollYoung           -0.109535   0.021494  -5.096 3.47e-07 ***
age_group_trollYoungTroll      -0.067748   0.087009  -0.779 0.436197    
plecM                           0.045470   0.012042   3.776 0.000159 ***
plecO                           0.071644   0.024581   2.915 0.003561 ** 
plecX                          -0.109337   0.052179  -2.095 0.036135 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 490133  on 2699999  degrees of freedom
Residual deviance: 488984  on 2699971  degrees of freedom
AIC: 489042

Number of Fisher Scoring iterations: 7

And and auc computations

library(ROCR)
auc <- function(pred_probs, real_classes){
  pred <- prediction(pred_probs, real_classes)
  performance(pred, "auc")@y.values[[1]] 
}

preds <- predict(modelGLM_clicks012, newdata = test, type="response")
> auc(preds, test$cliks012)
[1] 0.5230328

interpretation of coefficients from glm model, normal family with log link vs. linear model with logged outcome


I was fitting a linear model where the outcome was log transformed. The outcome is overdispersed and skewed and logging dramatically improved model fit. For reasons that relate to the software package I’m using, I now need to fit this model within a glm framework.

So I specify the normal family and log link. I understand the models are not equivalent because I was formerly modeling the mean of the logged observed values whereas I am now, with GLM, modeling the log of the expected mean. In particular, this post was very helpful in this regard Linear model with log-transformed response vs. generalized linear model with log link

What I am still unsure about is the proper interpretation of the coefficients. I also have a few predictors logged to confound things further. My plan was to calculate average marginal effects to evaluate some interactions and this necessitated the switch to the glm framework.

With the logged outcome, I would have simply exponentiated the coefficients and reported that as percent change in the outcome. With the logged predictor I would have evaluated it as an elasticity where 1.10^B would give me the change in the outcome for a 10% increase in the predictor. Can I still do this with the glm coefficients?

I have searched stack exchange and elsewhere for an answer to this simple question but I can’t find one. Any advice is greatly appreciated.

Question and Answer is proudly powered by WordPress.
Theme "The Fundamentals of Graphic Design" by Arjuna
Icons by FamFamFam