Skip to contents

The function simEvent generates simulated recurrent events or survival time (the first event time) from one stochastic process. The function simEventData provides a simple wrapper that calls simEvent internally and collects the generated survival data or recurrent events into a data frame. More examples are available in one of the package vignettes in addition to the function documentation.

Usage

simEvent(
  z = 0,
  zCoef = 1,
  rho = 1,
  rhoCoef = 1,
  rhoMax = NULL,
  origin = 0,
  endTime = 3,
  frailty = 1,
  recurrent = TRUE,
  interarrival = "rexp",
  relativeRisk = c("exponential", "linear", "excess", "none"),
  method = c("thinning", "inversion"),
  arguments = list(),
  ...
)

simEventData(nProcess = 1, z = 0, origin = 0, endTime = 3, frailty = 1, ...)

Arguments

z

Time-invariant or time-varying covariates. The default value is 0 for no covariate effect. This argument should be a numeric vector for time-invariant covariates or a function of times that returns a numeric matrix for time-varying covariates, where each row represents the covariate vector at one perticular time point.

zCoef

Time-invariant or time-varying coefficients of covariates. The default value is 1. Similar to the argument z, this argument should be a numeric vector for time-invariant coefficients or a function of times that returns a numeric matrix for time-varying coefficients, where each row represents the coefficient vector at one perticular time point. The dimension of the z and zCoef (either specified or generated) has to match with each other.

rho

Baseline rate (or intensity) function for the Poisson process. The default is 1 for a homogenous process of unit intensity. This argument can be either a non-negative numeric value for a homogenous process or a function of times for a non-homogenous process. In the latter case, the function should be able to take a vector of time points and return a numerical matrix (or vector) with each row representing the baseline hazard rate vector (or scalar value) at each time point.

rhoCoef

Coefficients of baseline rate function. The default value is 1. It can be useful when rho is a function generating spline bases.

rhoMax

A positive number representing an upper bound of the underlying rate function (excluding the frailty term but including the covariate effect) for the thinning method. If this argument is left unspecified, the function will try to determine an upper bound internally.

origin

The time origin set to be 0 by default. It should be either a numeric value less than endTime or a function that returns such a numeric value.

endTime

The end of follow-up time set to be 3 by default. Similar to origin, endTime should be either a numeric value greater than origin or a function that returns such a numeric value.

frailty

A positive number or a function for frailty effect. The default value is 1 for no frailty effect. Other positive value can be specified directly for a shared frailty effect within a cluster. Similar to z, zCoef, and rho, a function can be specified for other distribution of the frailty effect. The specified function should randomly return a positive numeric value. The functions that generate random numbers following a certain distribution from stats package can be directly used. The arguments of the function can be specified through a list named frailty in arguments. For example, if we consider Gamma distribution with mean one as the distribution of frailty effect, we may specify frailty = "rgamma". The shape and scale parameter needs to be specified through a list named frailty in arguments, such as arguments = list(frailty = list(shape = 2, scale = 0.5)).

recurrent

A logical value with default value TRUE indicating whether to generate recurrent event data or survival data.

interarrival

A function object for randomly generating (positive) interarrival time between two successive arrivals/events. The default value is "rexp" (i.e., function stats::rexp) for generating interarrival times following exponential distribution, which leads to a Poisson process. If the assumption of exponential interarrival times cannot be justified, we may consider a renewal process, (a generalization of Poisson process), in which interarrival times between events independently follows an identical distribution. A customized function can be specified in this case. It must have at least one argument named rate for the expected number of arrivals/events in unit time and returns one positive numerical value. If the function contains an argument named n, it is assumed that the function returns n interarrival times in one function call to possibly speed up the random number generation procedure. Other arguments can be specified through a named list inside arguments.

relativeRisk

Relateive risk function for incorporating the covariates and the covariate coefficients into the intensity function. The applicable choices include exponential (the default) for the regular Cox model or Andersen-Gill model, linear for linear model (including an intercept term), excess for excess model, and none for not incorporating the covariates through a relative risk function. A customized function can be specified. The specified function must have at least one argument named z for the covariate vector and another argument named zCoef for covariate coefficient vector. The function should return a numeric value for given z vector and zCoef vector. Other arguments can be specified through a named list inside arguments.

method

A character string specifying the method for generating simulated recurrent or survival data. The default method is thinning method (Lewis and Shedler 1979). Another available option is the inversion method (Cinlar 1975). When the rate function may go to infinite, the inversion method is used and a warning will be thrown out if the thinning method is initially specified.

arguments

A list that consists of named lists for specifying other arguments in the corresponding functions. For example, if a function of time named foo with two arguments, x (for time) and y, is specified for the time-varying covariates, the value of its second argument, y, can be specified by arguments = list(z = list(y = 1). A partial matching on names is not allowed to avoid possible misspecification. The input arguments will be evaluated within function simEvent, which can be useful for randomly setting function parameters for each process in function simEventData. See examples and vignettes for details.

...

Additional arguements passed from function simEventData to fucntion simEvent. For function simEvent, ... is not used.

nProcess

Number of stochastic processes. If missing, the value will be the number of row of the specified matrix z. Otherwise, a positive number should be speicified.

Value

The function simEvent returns a simEvent S4 class object and the function simEventData returns a data.frame.

Details

For each process, a time-invariant or time-varying baseline hazard rate (intensity) function of failure can be specified. Covariates and their coefficients can be specified and incorporated by the specified relative risk functions. The default is the exponential relative risk function, which corresponds to the Cox proportional hazard model (Cox 1972) for survival data or Andersen-Gill model (Andersen and Gill 1982) for recurrent events. Other relative risk function can be specified through the argument relativeRisk. In addition, a frailty effect can be considered. Conditional on predictors (or covariates) and the unobserved frailty effect, the process is by default a Poisson process, where the interarrival times between two successive arrivals/events follow exponential distribution. A general renewal process can be specified through interarrival for other distributions of the interarrival times in addition to the exponential distribution.

The thinning method (Lewis and Shedler 1979) is applied for bounded hazard rate function by default. The inversion method (Cinlar 1975) is also available for possibly unbounded but integrable rate function over the given time period. The inversion method will be used when the rate function may go to infinite and a warning will be thrown out if the thinning method is specified originally.

For the covariates z, the covariate coefficients zCoef, and the baseline hazard rate function rho, a function of time can be specified for time-varying effect. The first argument of the input function has to be the time variable (not need to be named as "time" though). Other arguments of the function can be specified through a named list in arguments, while the first argument should not be specified.

For the frailty effect frailty, the starting point origin, and the end point of the process endTime, functions that generate random numbers can be specified. An argument n = 1 will be implicitly specified if the function has an argument named n, which is designed for those common functions generating random numbers from stats package. Similar to z, zCoef, and rho, other arguments of the function can be specified through a named list in arguments.

For time-varying covariates, the function simEventData assumes covariates can be observed only at event times and censoring times. Thus, covariate values are returned only at these time points. If we want other observed covariate values to be recorded, we may write a simple wrapper function for simEvent similar to simEventData.

References

Andersen, P. K., & Gill, R. D. (1982). Cox's regression model for counting processes: A large sample study. The annals of statistics, 10(4), 1100--1120.

Cinlar, Erhan (1975). Introduction to stochastic processes. Englewood Cliffs, NJ: Printice-Hall.

Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), 34(2), 187--220.

Lewis, P. A., & G. S. Shedler. (1979). Simulation of Nonhomogeneous Poisson Processes by Thinning. Naval Research Logistics Quarterly, 26(3), Wiley Online Library: 403--13.

Examples

library(reda)
set.seed(123)

### time-invariant covariates and coefficients
## one process
simEvent(z = c(0.5, 1), zCoef = c(1, 0))
#> 'simEvent' S4 class object:
#> [1] 0.5115827 0.8613145 1.6674270 1.6865797 1.7206733 1.9126410 2.1032295
#> [8] 2.1913383
simEvent(z = 1, zCoef = 0.5, recurrent = FALSE)
#> 'simEvent' S4 class object:
#> [1] 0.2903829

## simulated data
simEventData(z = c(0.5, 1), zCoef = c(1, 0), endTime = 2)
#>   ID      time event origin X.1 X.2
#> 1  1 0.4591666     1      0 0.5   1
#> 2  1 0.4784347     1      0 0.5   1
#> 3  1 0.8410489     1      0 0.5   1
#> 4  1 2.0000000     0      0 0.5   1
simEventData(z = cbind(rnorm(3), 1), zCoef = c(1, 0))
#>    ID      time event origin       X.1 X.2
#> 1   1 0.2079348     1      0 0.8951257   1
#> 2   1 0.3135230     1      0 0.8951257   1
#> 3   1 0.4852657     1      0 0.8951257   1
#> 4   1 3.0000000     0      0 0.8951257   1
#> 5   2 0.6963856     1      0 0.8781335   1
#> 6   2 1.8288953     1      0 0.8781335   1
#> 7   2 2.3741750     1      0 0.8781335   1
#> 8   2 2.4118209     1      0 0.8781335   1
#> 9   2 2.5390663     1      0 0.8781335   1
#> 10  2 2.9825551     1      0 0.8781335   1
#> 11  2 3.0000000     0      0 0.8781335   1
#> 12  3 0.1471895     1      0 0.8215811   1
#> 13  3 0.2723684     1      0 0.8215811   1
#> 14  3 0.9596983     1      0 0.8215811   1
#> 15  3 0.9782061     1      0 0.8215811   1
#> 16  3 1.0215776     1      0 0.8215811   1
#> 17  3 1.0649221     1      0 0.8215811   1
#> 18  3 1.1882175     1      0 0.8215811   1
#> 19  3 1.3182856     1      0 0.8215811   1
#> 20  3 1.7458978     1      0 0.8215811   1
#> 21  3 2.1522257     1      0 0.8215811   1
#> 22  3 2.8744498     1      0 0.8215811   1
#> 23  3 3.0000000     0      0 0.8215811   1
simEventData(z = matrix(rnorm(5)), zCoef = 0.5, recurrent = FALSE)
#>   ID      time event origin          X
#> 1  1 0.8635377     1      0 -0.2288958
#> 2  2 2.9133870     1      0 -0.9007918
#> 3  3 1.0878381     1      0 -0.7350262
#> 4  4 1.7275950     1      0 -1.4276858
#> 5  5 0.6040343     1      0  0.6192835


### time-varying covariates and time-varying coefficients
zFun <- function(time, intercept) {
    cbind(time / 10 + intercept, as.numeric(time > 1))
}
zCoefFun <- function(x, shift) {
    cbind(sqrt(x + shift), 1)
}
simEvent(z = zFun, zCoef = zCoefFun,
         arguments = list(z = list(intercept = 0.1),
                          zCoef = list(shift = 0.1)))
#> 'simEvent' S4 class object:
#>  [1] 0.01703579 0.69742521 1.25021917 1.26152287 1.36531462 1.68015169
#>  [7] 1.91858209 2.15412550 2.23913149 2.24511429 2.27295262 2.46185550
#> [13] 2.65875740

## same function of time for all processes
simEventData(3, z = zFun, zCoef = zCoefFun,
             arguments = list(z = list(intercept = 0.1),
                              zCoef = list(shift = 0.1)))
#>    ID      time event origin       X.1 X.2
#> 1   1 1.4109295     1      0 0.2410929   1
#> 2   1 1.5959107     1      0 0.2595911   1
#> 3   1 1.6432538     1      0 0.2643254   1
#> 4   1 1.7703764     1      0 0.2770376   1
#> 5   1 1.9455848     1      0 0.2945585   1
#> 6   1 2.0786444     1      0 0.3078644   1
#> 7   1 2.2809732     1      0 0.3280973   1
#> 8   1 2.5132731     1      0 0.3513273   1
#> 9   1 2.8100794     1      0 0.3810079   1
#> 10  1 2.8286202     1      0 0.3828620   1
#> 11  1 2.9028344     1      0 0.3902834   1
#> 12  1 3.0000000     0      0 0.4000000   1
#> 13  2 0.1832995     1      0 0.1183299   0
#> 14  2 0.4628561     1      0 0.1462856   0
#> 15  2 0.9558699     1      0 0.1955870   0
#> 16  2 1.9629071     1      0 0.2962907   1
#> 17  2 2.2029446     1      0 0.3202945   1
#> 18  2 2.3722549     1      0 0.3372255   1
#> 19  2 2.4296379     1      0 0.3429638   1
#> 20  2 2.4559745     1      0 0.3455974   1
#> 21  2 2.6248355     1      0 0.3624836   1
#> 22  2 2.6718292     1      0 0.3671829   1
#> 23  2 2.7697645     1      0 0.3769764   1
#> 24  2 2.7834718     1      0 0.3783472   1
#> 25  2 2.8068352     1      0 0.3806835   1
#> 26  2 3.0000000     0      0 0.4000000   1
#> 27  3 0.3323904     1      0 0.1332390   0
#> 28  3 1.1144789     1      0 0.2114479   1
#> 29  3 1.1855470     1      0 0.2185547   1
#> 30  3 1.3831537     1      0 0.2383154   1
#> 31  3 1.8689437     1      0 0.2868944   1
#> 32  3 2.1737468     1      0 0.3173747   1
#> 33  3 2.3238397     1      0 0.3323840   1
#> 34  3 2.3543533     1      0 0.3354353   1
#> 35  3 2.4386507     1      0 0.3438651   1
#> 36  3 2.7497916     1      0 0.3749792   1
#> 37  3 2.8391334     1      0 0.3839133   1
#> 38  3 3.0000000     0      0 0.4000000   1

## same function within one process but different between processes
## use quote function in the arguments
simDat <- simEventData(3, z = zFun, zCoef = zCoefFun,
                       arguments = list(
                           z = list(intercept = quote(rnorm(1) / 10)),
                           zCoef = list(shift = 0.1)
                       ))
## check the intercept randomly generated,
## which should be the same within each ID but different between IDs.
unique(with(simDat, cbind(ID, intercept = round(X.1 - time / 10, 6))))
#>      ID intercept
#> [1,]  1  0.087936
#> [2,]  2  0.046494
#> [3,]  3  0.131681


### non-negative time-varying baseline hazard rate function
simEvent(rho = function(timeVec) { sin(timeVec) + 1 })
#> 'simEvent' S4 class object:
#> [1] 1.336428 1.655383 2.337511 2.431350 2.887220
simEventData(3, origin = rnorm(3), endTime = rnorm(3, 5),
             rho = function(timeVec) { sin(timeVec) + 1 })
#>    ID       time event      origin X
#> 1   1  1.8825263     1  0.00729009 0
#> 2   1  1.9038402     1  0.00729009 0
#> 3   1  3.1337841     1  0.00729009 0
#> 4   1  4.2783956     0  0.00729009 0
#> 5   2  1.2323134     1  1.01755864 0
#> 6   2  1.5635354     1  1.01755864 0
#> 7   2  1.7543646     1  1.01755864 0
#> 8   2  1.7767004     1  1.01755864 0
#> 9   2  2.6526406     1  1.01755864 0
#> 10  2  3.0130011     1  1.01755864 0
#> 11  2  3.1478016     1  1.01755864 0
#> 12  2  3.1873625     1  1.01755864 0
#> 13  2  6.5192177     0  1.01755864 0
#> 14  3 -0.9590731     1 -1.18843404 0
#> 15  3 -0.3631609     1 -1.18843404 0
#> 16  3 -0.1507877     1 -1.18843404 0
#> 17  3  1.8134301     1 -1.18843404 0
#> 18  3  2.4938328     1 -1.18843404 0
#> 19  3  2.7181862     1 -1.18843404 0
#> 20  3  5.3773880     0 -1.18843404 0
## specify other arguments
simEvent(z = c(rnorm(1), rbinom(1, 1, 0.5)) / 10,
         rho = function(a, b) { sin(a + b) + 1 },
         arguments = list(rho = list(b = 0.5)))
#> 'simEvent' S4 class object:
#> [1] 0.3575603 1.0572763 1.6624542 1.8274776 2.1770491
simEventData(z = cbind(rnorm(3), rbinom(3, 1, 0.5)) / 10,
             rho = function(a, b) { sin(a + b) + 1 },
             arguments = list(rho = list(b = 0.5)))
#>    ID      time event origin         X.1 X.2
#> 1   1 0.3168247     1      0  0.08842508 0.1
#> 2   1 0.6545106     1      0  0.08842508 0.1
#> 3   1 1.1262112     1      0  0.08842508 0.1
#> 4   1 1.4285794     1      0  0.08842508 0.1
#> 5   1 1.7504449     1      0  0.08842508 0.1
#> 6   1 3.0000000     0      0  0.08842508 0.1
#> 7   2 0.4292104     1      0 -0.08294776 0.1
#> 8   2 1.1148960     1      0 -0.08294776 0.1
#> 9   2 1.2032776     1      0 -0.08294776 0.1
#> 10  2 1.2639104     1      0 -0.08294776 0.1
#> 11  2 1.4494407     1      0 -0.08294776 0.1
#> 12  2 2.0144258     1      0 -0.08294776 0.1
#> 13  2 2.0402714     1      0 -0.08294776 0.1
#> 14  2 3.0000000     0      0 -0.08294776 0.1
#> 15  3 1.1975386     1      0 -0.05735603 0.0
#> 16  3 1.2009079     1      0 -0.05735603 0.0
#> 17  3 1.9563431     1      0 -0.05735603 0.0
#> 18  3 2.4848427     1      0 -0.05735603 0.0
#> 19  3 2.9122310     1      0 -0.05735603 0.0
#> 20  3 3.0000000     0      0 -0.05735603 0.0

## quadratic B-splines with one internal knot at "time = 1"
## (using function 'bSpline' from splines2 package)
simEvent(rho = splines2::bSpline, rhoCoef = c(0.8, 0.5, 1, 0.6),
         arguments = list(rho = list(degree = 2, knots = 1,
                                     intercept = TRUE,
                                     Boundary.knots = c(0, 3))))
#> 'simEvent' S4 class object:
#> [1] 0.09089787 0.21678348 1.06873299 1.96199882


### frailty effect
## Gamma distribution with mean one
simEvent(z = c(0.5, 1), zCoef = c(1, 0), frailty = rgamma,
         arguments = list(frailty = list(shape = 2, scale = 0.5)))
#> 'simEvent' S4 class object:
#> [1] 0.3961255 0.5721497 0.9267213 0.9505554 1.5040362 1.5561853

## lognormal with mean zero (on the log scale)
set.seed(123)
simEvent(z = c(0.5, 1), zCoef = c(1, 0), frailty = "rlnorm",
         arguments = list(frailty = list(sdlog = 1)))
#> 'simEvent' S4 class object:
#> [1] 1.411910 1.445456 1.505172 1.841404 2.175221 2.329544
## or equivalently
set.seed(123)
logNorm <- function(a) exp(rnorm(n = 1, mean = 0, sd = a))
simEvent(z = c(0.5, 1), zCoef = c(1, 0), frailty = logNorm,
         arguments = list(frailty = list(a = 1)))
#> 'simEvent' S4 class object:
#> [1] 1.411910 1.445456 1.505172 1.841404 2.175221 2.329544


### renewal process
## interarrival times following uniform distribution
rUnif <- function(n, rate, min) runif(n, min, max = 2 / rate)
simEvent(interarrival = rUnif,
         arguments = list(interarrival = list(min = 0)))
#> 'simEvent' S4 class object:
#> [1] 1.311412 2.728473

## interarrival times following Gamma distribution with scale one
set.seed(123)
simEvent(interarrival = function(n, rate) rgamma(n, shape = 1 / rate))
#> 'simEvent' S4 class object:
#> [1] 0.1822171 1.8779682
## or equivalently
set.seed(123)
simEvent(interarrival = function(rate) rgamma(n = 1, shape = 1 / rate))
#> 'simEvent' S4 class object:
#> [1] 0.1822171 1.8779682


### relative risk functioin
set.seed(123)
simEvent(relativeRisk = "linear")
#> 'simEvent' S4 class object:
#> [1] 0.8434573 1.4200675 2.7491224 2.7806998 2.8369107
## or equivalently
rriskFun <- function(z, zCoef, intercept) {
    as.numeric(z %*% zCoef) + intercept
}
set.seed(123)
simEvent(relativeRisk = rriskFun,
         arguments = list(relativeRisk = list(intercept = 1)))
#> 'simEvent' S4 class object:
#> [1] 0.8434573 1.4200675 2.7491224 2.7806998 2.8369107