Skip to contents

Multiple outputation is a procedure whereby excess observations are repeatedly randomly sampled and discarded. The method was originally developed to handle clustered data where cluster size is informative, for example when studying pups in a litter. In this case, analysis that ignores cluster size results in larger litters being over-represented in a marginal analysis. Multiple outputation circumvents this problem by randomly selecting one observation per cluster. Multiple outputation has been further adapted to handle longitudinal data subject to irregular observation; here the probability of being retained on any given outputation is inversely proportional to the visit intensity. This function creates a single outputted dataset.

Usage

outputation(data, weights, singleobs, id, time, keep.first)

Arguments

data

the original dataset on which multiple outputation is to be performed

weights

the weights to be used in the outputation, i.e. the inverse of the probability that a given observation will be selected in creating an outputted dataset. Ignored if singleobs=TRUE

singleobs

logical variable indicating whether a single observation should be retained for each subject

id

character string indicating which column of the data identifies subjects

time

character string indicating which column of the data contains the time at which the visit occurred

keep.first

logical variable indicating whether the first observation should be retained with probability 1. This is useful if the data consists of an observation at baseline followed by follow-up at stochastic time points.

Value

the outputted dataset.

References

  • Hoffman E, Sen P, Weinberg C. Within-cluster resampling. Biometrika 2001; 88:1121-1134

  • Follmann D, Proschan M, Leifer E. Multiple outputation: inference for complex clustered data by averaging analyses from independent data. Biometrics 2003; 59:420-429

  • Pullenayegum EM. Multiple outputation for the analysis of longitudinal data subject to irregular observation. Statistics in Medicine (in press).

See also

Other mo: mo()

Examples

library(nlme)
data(Phenobarb)
library(survival)
library(geepack)
library(data.table)
Phenobarb$event <- 1-as.numeric(is.na(Phenobarb$conc))
data <- Phenobarb
data <- data[data$event==1,]
data$id <- as.numeric(data$Subject)
data <- data[data$time<16*24,]
i <- iiw.weights(Surv(time.lag,time,event)~I(conc.lag>0 & conc.lag<=20) +
I(conc.lag>20 & conc.lag<=30) + I(conc.lag>30)+ cluster(id),
id="id",time="time",event="event",data=data,
invariant="id",lagvars=c("time","conc"),maxfu=16*24,lagfirst=0,first=TRUE)
data$weight <- i$iiw.weight
head(data)
#> Grouped Data: conc ~ time | Subject
#>    Subject  Wt Apgar ApgarInd  time dose conc event id weight
#> 2        1 1.4     7     >= 5   2.0   NA 17.3     1 32  1.000
#> 12       1 1.4     7     >= 5 112.5   NA 31.0     1 32  0.478
#> 14       2 1.5     9     >= 5   2.0   NA  9.7     1 38  1.000
#> 20       2 1.5     9     >= 5  63.5   NA 24.6     1 38  1.000
#> 27       2 1.5     9     >= 5 135.5   NA 33.0     1 38  0.478
#> 29       3 1.5     6     >= 5   1.5   NA 18.0     1 11  1.000
data.output1 <-   outputation(data,data$weight,singleobs=FALSE,
id="id",time="time",keep.first=FALSE)
head(data.output1)
#> Grouped Data: conc ~ time | Subject
#>    Subject  Wt Apgar ApgarInd  time dose conc event id weight
#> 2        1 1.4     7     >= 5   2.0   NA 17.3     1 32  1.000
#> 12       1 1.4     7     >= 5 112.5   NA 31.0     1 32  0.478
#> 14       2 1.5     9     >= 5   2.0   NA  9.7     1 38  1.000
#> 20       2 1.5     9     >= 5  63.5   NA 24.6     1 38  1.000
#> 27       2 1.5     9     >= 5 135.5   NA 33.0     1 38  0.478
#> 29       3 1.5     6     >= 5   1.5   NA 18.0     1 11  1.000
data.output2 <-   outputation(data,data$weight,singleobs=FALSE,
id="id",time="time",keep.first=FALSE)
head(data.output2)
#> Grouped Data: conc ~ time | Subject
#>    Subject  Wt Apgar ApgarInd  time dose conc event id weight
#> 2        1 1.4     7     >= 5   2.0   NA 17.3     1 32      1
#> 14       2 1.5     9     >= 5   2.0   NA  9.7     1 38      1
#> 20       2 1.5     9     >= 5  63.5   NA 24.6     1 38      1
#> 29       3 1.5     6     >= 5   1.5   NA 18.0     1 11      1
#> 36       3 1.5     6     >= 5  83.5   NA 23.8     1 11      1
#> 42       3 1.5     6     >= 5 134.3   NA 24.3     1 11      1
data.output3 <-   outputation(data,data$weight,singleobs=FALSE,
id="id",time="time",keep.first=FALSE)
head(data.output3)
#> Grouped Data: conc ~ time | Subject
#>    Subject  Wt Apgar ApgarInd  time dose conc event id weight
#> 2        1 1.4     7     >= 5   2.0   NA 17.3     1 32  1.000
#> 14       2 1.5     9     >= 5   2.0   NA  9.7     1 38  1.000
#> 20       2 1.5     9     >= 5  63.5   NA 24.6     1 38  1.000
#> 27       2 1.5     9     >= 5 135.5   NA 33.0     1 38  0.478
#> 29       3 1.5     6     >= 5   1.5   NA 18.0     1 11  1.000
#> 36       3 1.5     6     >= 5  83.5   NA 23.8     1 11  1.000
# Note that the outputted dataset varies with each command run; outputation is done at random