I have a problem that has puzzled me for a long time. It involves linear models and spline functions. I need to model “time since diagnosis” when some individuals never had a diagnosis. I use Poisson model where I have split time on age and time since diagnosis. The data set may look like this (with millions upon millions of rows,and with 95 % having no diagnosis ) :
SUBJECT AGE DIAGNOSIS TIME_SINCE_DIAGNOSIS EVENT/OUTCOME
1 30 A 0 0
1 31 A 1 0
1 32 A 2 1
2 45 B 0 0
2 46 B 1 0
103 22 NON “0 or -” 0
103 23 NON “0 or -” 0
You get the idea .. Now I want to have a model that looks something like this:
log ( OUTCOME ) = Spline( AGE ) + DIAGNOSIS * Spline( TIME_SINCE_DIAGNOSIS ) + offset ( logrisktime )
and then plotting the predicted risk for the different diagnoses of A and B as a function of ” time since diagnosis ” given that the diagnosis was at age = 30, and in the same plot for diagnosis = NON but then instead as a function of age ( with year since 30 on x-axis). I like to have both those with a diagnosis and those with diagnosis=NON in the same model to be able to do contrasts/linear combinations of effects.
Model and plot is done, but my problem is that the model above also creates a spline over the ” time since diagnosis ” for those who have no diagnosis (i.e. DIAGNOSIS = NON ). So, I would like to have a model of this type which I created indicator variables for the different diagnoses :
R: factor(A) factor(B) fator(NON)
SAS: class A B NON;
log ( OUTCOME ) = Spline ( AGE ) + NON + A*Spline( TIME_SINCE_DIAGNOSIS ) + B*Spline ( TIME_SINCE_DIAGNOSIS ) + offset ( logrisktime )
Are you with me ?
So I want to create a unique spline of the variable “time since diagnosis” for each level of the variable “diagnosis” (which is done with diagnosis*spline(time_since_diagnosis) ), but with the exception that one level (those who had no diagnosis) does not to have any spline assigned.