The function simEvent
generates simulated recurrent events or
survival time (the first event time) from one stochastic process. The
function simEventData
provides a simple wrapper that calls
simEvent
internally and collects the generated survival data or
recurrent events into a data frame. More examples are available in one of
the package vignettes in addition to the function documentation.
Usage
simEvent(
z = 0,
zCoef = 1,
rho = 1,
rhoCoef = 1,
rhoMax = NULL,
origin = 0,
endTime = 3,
frailty = 1,
recurrent = TRUE,
interarrival = "rexp",
relativeRisk = c("exponential", "linear", "excess", "none"),
method = c("thinning", "inversion"),
arguments = list(),
...
)
simEventData(nProcess = 1, z = 0, origin = 0, endTime = 3, frailty = 1, ...)
Arguments
- z
Time-invariant or time-varying covariates. The default value is
0
for no covariate effect. This argument should be a numeric vector for time-invariant covariates or a function of times that returns a numeric matrix for time-varying covariates, where each row represents the covariate vector at one perticular time point.- zCoef
Time-invariant or time-varying coefficients of covariates. The default value is
1
. Similar to the argumentz
, this argument should be a numeric vector for time-invariant coefficients or a function of times that returns a numeric matrix for time-varying coefficients, where each row represents the coefficient vector at one perticular time point. The dimension of thez
andzCoef
(either specified or generated) has to match with each other.- rho
Baseline rate (or intensity) function for the Poisson process. The default is
1
for a homogenous process of unit intensity. This argument can be either a non-negative numeric value for a homogenous process or a function of times for a non-homogenous process. In the latter case, the function should be able to take a vector of time points and return a numerical matrix (or vector) with each row representing the baseline hazard rate vector (or scalar value) at each time point.- rhoCoef
Coefficients of baseline rate function. The default value is
1
. It can be useful whenrho
is a function generating spline bases.- rhoMax
A positive number representing an upper bound of the underlying rate function (excluding the frailty term but including the covariate effect) for the thinning method. If this argument is left unspecified, the function will try to determine an upper bound internally.
- origin
The time origin set to be
0
by default. It should be either a numeric value less thanendTime
or a function that returns such a numeric value.- endTime
The end of follow-up time set to be
3
by default. Similar toorigin
,endTime
should be either a numeric value greater thanorigin
or a function that returns such a numeric value.- frailty
A positive number or a function for frailty effect. The default value is
1
for no frailty effect. Other positive value can be specified directly for a shared frailty effect within a cluster. Similar toz
,zCoef
, andrho
, a function can be specified for other distribution of the frailty effect. The specified function should randomly return a positive numeric value. The functions that generate random numbers following a certain distribution fromstats
package can be directly used. The arguments of the function can be specified through a list namedfrailty
inarguments
. For example, if we consider Gamma distribution with mean one as the distribution of frailty effect, we may specifyfrailty = "rgamma"
. The shape and scale parameter needs to be specified through a list namedfrailty
inarguments
, such asarguments = list(frailty = list(shape = 2, scale = 0.5))
.- recurrent
A logical value with default value
TRUE
indicating whether to generate recurrent event data or survival data.- interarrival
A function object for randomly generating (positive) interarrival time between two successive arrivals/events. The default value is
"rexp"
(i.e., functionstats::rexp
) for generating interarrival times following exponential distribution, which leads to a Poisson process. If the assumption of exponential interarrival times cannot be justified, we may consider a renewal process, (a generalization of Poisson process), in which interarrival times between events independently follows an identical distribution. A customized function can be specified in this case. It must have at least one argument namedrate
for the expected number of arrivals/events in unit time and returns one positive numerical value. If the function contains an argument namedn
, it is assumed that the function returnsn
interarrival times in one function call to possibly speed up the random number generation procedure. Other arguments can be specified through a named list insidearguments
.- relativeRisk
Relateive risk function for incorporating the covariates and the covariate coefficients into the intensity function. The applicable choices include
exponential
(the default) for the regular Cox model or Andersen-Gill model,linear
for linear model (including an intercept term),excess
for excess model, andnone
for not incorporating the covariates through a relative risk function. A customized function can be specified. The specified function must have at least one argument namedz
for the covariate vector and another argument named zCoef for covariate coefficient vector. The function should return a numeric value for givenz
vector andzCoef
vector. Other arguments can be specified through a named list insidearguments
.- method
A character string specifying the method for generating simulated recurrent or survival data. The default method is thinning method (Lewis and Shedler 1979). Another available option is the inversion method (Cinlar 1975). When the rate function may go to infinite, the inversion method is used and a warning will be thrown out if the thinning method is initially specified.
- arguments
A list that consists of named lists for specifying other arguments in the corresponding functions. For example, if a function of time named
foo
with two arguments,x
(for time) andy
, is specified for the time-varying covariates, the value of its second argument,y
, can be specified byarguments = list(z = list(y = 1)
. A partial matching on names is not allowed to avoid possible misspecification. The input arguments will be evaluated within functionsimEvent
, which can be useful for randomly setting function parameters for each process in functionsimEventData
. See examples and vignettes for details.- ...
Additional arguements passed from function
simEventData
to fucntionsimEvent
. For functionsimEvent
,...
is not used.- nProcess
Number of stochastic processes. If missing, the value will be the number of row of the specified matrix
z
. Otherwise, a positive number should be speicified.
Value
The function simEvent
returns a simEvent
S4 class object and
the function simEventData
returns a data.frame
.
Details
For each process, a time-invariant or time-varying baseline hazard rate
(intensity) function of failure can be specified. Covariates and their
coefficients can be specified and incorporated by the specified relative
risk functions. The default is the exponential relative risk function, which
corresponds to the Cox proportional hazard model (Cox 1972) for survival
data or Andersen-Gill model (Andersen and Gill 1982) for recurrent
events. Other relative risk function can be specified through the argument
relativeRisk
. In addition, a frailty effect can be considered.
Conditional on predictors (or covariates) and the unobserved frailty effect,
the process is by default a Poisson process, where the interarrival times
between two successive arrivals/events follow exponential distribution. A
general renewal process can be specified through interarrival
for
other distributions of the interarrival times in addition to the exponential
distribution.
The thinning method (Lewis and Shedler 1979) is applied for bounded hazard rate function by default. The inversion method (Cinlar 1975) is also available for possibly unbounded but integrable rate function over the given time period. The inversion method will be used when the rate function may go to infinite and a warning will be thrown out if the thinning method is specified originally.
For the covariates z
, the covariate coefficients zCoef
, and
the baseline hazard rate function rho
, a function of time can be
specified for time-varying effect. The first argument of the input function
has to be the time variable (not need to be named as "time" though). Other
arguments of the function can be specified through a named list in
arguments
, while the first argument should not be specified.
For the frailty effect frailty
, the starting point origin
, and
the end point of the process endTime
, functions that generate random
numbers can be specified. An argument n = 1
will be implicitly
specified if the function has an argument named n
, which is designed
for those common functions generating random numbers from stats
package. Similar to z
, zCoef
, and rho
, other arguments
of the function can be specified through a named list in arguments
.
For time-varying covariates, the function simEventData
assumes
covariates can be observed only at event times and censoring times. Thus,
covariate values are returned only at these time points. If we want other
observed covariate values to be recorded, we may write a simple wrapper
function for simEvent
similar to simEventData
.
References
Andersen, P. K., & Gill, R. D. (1982). Cox's regression model for counting processes: A large sample study. The annals of statistics, 10(4), 1100--1120.
Cinlar, Erhan (1975). Introduction to stochastic processes. Englewood Cliffs, NJ: Printice-Hall.
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), 34(2), 187--220.
Lewis, P. A., & G. S. Shedler. (1979). Simulation of Nonhomogeneous Poisson Processes by Thinning. Naval Research Logistics Quarterly, 26(3), Wiley Online Library: 403--13.
Examples
library(reda)
set.seed(123)
### time-invariant covariates and coefficients
## one process
simEvent(z = c(0.5, 1), zCoef = c(1, 0))
#> 'simEvent' S4 class object:
#> [1] 0.5115827 0.8613145 1.6674270 1.6865797 1.7206733 1.9126410 2.1032295
#> [8] 2.1913383
simEvent(z = 1, zCoef = 0.5, recurrent = FALSE)
#> 'simEvent' S4 class object:
#> [1] 0.2903829
## simulated data
simEventData(z = c(0.5, 1), zCoef = c(1, 0), endTime = 2)
#> ID time event origin X.1 X.2
#> 1 1 0.4591666 1 0 0.5 1
#> 2 1 0.4784347 1 0 0.5 1
#> 3 1 0.8410489 1 0 0.5 1
#> 4 1 2.0000000 0 0 0.5 1
simEventData(z = cbind(rnorm(3), 1), zCoef = c(1, 0))
#> ID time event origin X.1 X.2
#> 1 1 0.2079348 1 0 0.8951257 1
#> 2 1 0.3135230 1 0 0.8951257 1
#> 3 1 0.4852657 1 0 0.8951257 1
#> 4 1 3.0000000 0 0 0.8951257 1
#> 5 2 0.6963856 1 0 0.8781335 1
#> 6 2 1.8288953 1 0 0.8781335 1
#> 7 2 2.3741750 1 0 0.8781335 1
#> 8 2 2.4118209 1 0 0.8781335 1
#> 9 2 2.5390663 1 0 0.8781335 1
#> 10 2 2.9825551 1 0 0.8781335 1
#> 11 2 3.0000000 0 0 0.8781335 1
#> 12 3 0.1471895 1 0 0.8215811 1
#> 13 3 0.2723684 1 0 0.8215811 1
#> 14 3 0.9596983 1 0 0.8215811 1
#> 15 3 0.9782061 1 0 0.8215811 1
#> 16 3 1.0215776 1 0 0.8215811 1
#> 17 3 1.0649221 1 0 0.8215811 1
#> 18 3 1.1882175 1 0 0.8215811 1
#> 19 3 1.3182856 1 0 0.8215811 1
#> 20 3 1.7458978 1 0 0.8215811 1
#> 21 3 2.1522257 1 0 0.8215811 1
#> 22 3 2.8744498 1 0 0.8215811 1
#> 23 3 3.0000000 0 0 0.8215811 1
simEventData(z = matrix(rnorm(5)), zCoef = 0.5, recurrent = FALSE)
#> ID time event origin X
#> 1 1 0.8635377 1 0 -0.2288958
#> 2 2 2.9133870 1 0 -0.9007918
#> 3 3 1.0878381 1 0 -0.7350262
#> 4 4 1.7275950 1 0 -1.4276858
#> 5 5 0.6040343 1 0 0.6192835
### time-varying covariates and time-varying coefficients
zFun <- function(time, intercept) {
cbind(time / 10 + intercept, as.numeric(time > 1))
}
zCoefFun <- function(x, shift) {
cbind(sqrt(x + shift), 1)
}
simEvent(z = zFun, zCoef = zCoefFun,
arguments = list(z = list(intercept = 0.1),
zCoef = list(shift = 0.1)))
#> 'simEvent' S4 class object:
#> [1] 0.01703579 0.69742521 1.25021917 1.26152287 1.36531462 1.68015169
#> [7] 1.91858209 2.15412550 2.23913149 2.24511429 2.27295262 2.46185550
#> [13] 2.65875740
## same function of time for all processes
simEventData(3, z = zFun, zCoef = zCoefFun,
arguments = list(z = list(intercept = 0.1),
zCoef = list(shift = 0.1)))
#> ID time event origin X.1 X.2
#> 1 1 1.4109295 1 0 0.2410929 1
#> 2 1 1.5959107 1 0 0.2595911 1
#> 3 1 1.6432538 1 0 0.2643254 1
#> 4 1 1.7703764 1 0 0.2770376 1
#> 5 1 1.9455848 1 0 0.2945585 1
#> 6 1 2.0786444 1 0 0.3078644 1
#> 7 1 2.2809732 1 0 0.3280973 1
#> 8 1 2.5132731 1 0 0.3513273 1
#> 9 1 2.8100794 1 0 0.3810079 1
#> 10 1 2.8286202 1 0 0.3828620 1
#> 11 1 2.9028344 1 0 0.3902834 1
#> 12 1 3.0000000 0 0 0.4000000 1
#> 13 2 0.1832995 1 0 0.1183299 0
#> 14 2 0.4628561 1 0 0.1462856 0
#> 15 2 0.9558699 1 0 0.1955870 0
#> 16 2 1.9629071 1 0 0.2962907 1
#> 17 2 2.2029446 1 0 0.3202945 1
#> 18 2 2.3722549 1 0 0.3372255 1
#> 19 2 2.4296379 1 0 0.3429638 1
#> 20 2 2.4559745 1 0 0.3455974 1
#> 21 2 2.6248355 1 0 0.3624836 1
#> 22 2 2.6718292 1 0 0.3671829 1
#> 23 2 2.7697645 1 0 0.3769764 1
#> 24 2 2.7834718 1 0 0.3783472 1
#> 25 2 2.8068352 1 0 0.3806835 1
#> 26 2 3.0000000 0 0 0.4000000 1
#> 27 3 0.3323904 1 0 0.1332390 0
#> 28 3 1.1144789 1 0 0.2114479 1
#> 29 3 1.1855470 1 0 0.2185547 1
#> 30 3 1.3831537 1 0 0.2383154 1
#> 31 3 1.8689437 1 0 0.2868944 1
#> 32 3 2.1737468 1 0 0.3173747 1
#> 33 3 2.3238397 1 0 0.3323840 1
#> 34 3 2.3543533 1 0 0.3354353 1
#> 35 3 2.4386507 1 0 0.3438651 1
#> 36 3 2.7497916 1 0 0.3749792 1
#> 37 3 2.8391334 1 0 0.3839133 1
#> 38 3 3.0000000 0 0 0.4000000 1
## same function within one process but different between processes
## use quote function in the arguments
simDat <- simEventData(3, z = zFun, zCoef = zCoefFun,
arguments = list(
z = list(intercept = quote(rnorm(1) / 10)),
zCoef = list(shift = 0.1)
))
## check the intercept randomly generated,
## which should be the same within each ID but different between IDs.
unique(with(simDat, cbind(ID, intercept = round(X.1 - time / 10, 6))))
#> ID intercept
#> [1,] 1 0.087936
#> [2,] 2 0.046494
#> [3,] 3 0.131681
### non-negative time-varying baseline hazard rate function
simEvent(rho = function(timeVec) { sin(timeVec) + 1 })
#> 'simEvent' S4 class object:
#> [1] 1.336428 1.655383 2.337511 2.431350 2.887220
simEventData(3, origin = rnorm(3), endTime = rnorm(3, 5),
rho = function(timeVec) { sin(timeVec) + 1 })
#> ID time event origin X
#> 1 1 1.8825263 1 0.00729009 0
#> 2 1 1.9038402 1 0.00729009 0
#> 3 1 3.1337841 1 0.00729009 0
#> 4 1 4.2783956 0 0.00729009 0
#> 5 2 1.2323134 1 1.01755864 0
#> 6 2 1.5635354 1 1.01755864 0
#> 7 2 1.7543646 1 1.01755864 0
#> 8 2 1.7767004 1 1.01755864 0
#> 9 2 2.6526406 1 1.01755864 0
#> 10 2 3.0130011 1 1.01755864 0
#> 11 2 3.1478016 1 1.01755864 0
#> 12 2 3.1873625 1 1.01755864 0
#> 13 2 6.5192177 0 1.01755864 0
#> 14 3 -0.9590731 1 -1.18843404 0
#> 15 3 -0.3631609 1 -1.18843404 0
#> 16 3 -0.1507877 1 -1.18843404 0
#> 17 3 1.8134301 1 -1.18843404 0
#> 18 3 2.4938328 1 -1.18843404 0
#> 19 3 2.7181862 1 -1.18843404 0
#> 20 3 5.3773880 0 -1.18843404 0
## specify other arguments
simEvent(z = c(rnorm(1), rbinom(1, 1, 0.5)) / 10,
rho = function(a, b) { sin(a + b) + 1 },
arguments = list(rho = list(b = 0.5)))
#> 'simEvent' S4 class object:
#> [1] 0.3575603 1.0572763 1.6624542 1.8274776 2.1770491
simEventData(z = cbind(rnorm(3), rbinom(3, 1, 0.5)) / 10,
rho = function(a, b) { sin(a + b) + 1 },
arguments = list(rho = list(b = 0.5)))
#> ID time event origin X.1 X.2
#> 1 1 0.3168247 1 0 0.08842508 0.1
#> 2 1 0.6545106 1 0 0.08842508 0.1
#> 3 1 1.1262112 1 0 0.08842508 0.1
#> 4 1 1.4285794 1 0 0.08842508 0.1
#> 5 1 1.7504449 1 0 0.08842508 0.1
#> 6 1 3.0000000 0 0 0.08842508 0.1
#> 7 2 0.4292104 1 0 -0.08294776 0.1
#> 8 2 1.1148960 1 0 -0.08294776 0.1
#> 9 2 1.2032776 1 0 -0.08294776 0.1
#> 10 2 1.2639104 1 0 -0.08294776 0.1
#> 11 2 1.4494407 1 0 -0.08294776 0.1
#> 12 2 2.0144258 1 0 -0.08294776 0.1
#> 13 2 2.0402714 1 0 -0.08294776 0.1
#> 14 2 3.0000000 0 0 -0.08294776 0.1
#> 15 3 1.1975386 1 0 -0.05735603 0.0
#> 16 3 1.2009079 1 0 -0.05735603 0.0
#> 17 3 1.9563431 1 0 -0.05735603 0.0
#> 18 3 2.4848427 1 0 -0.05735603 0.0
#> 19 3 2.9122310 1 0 -0.05735603 0.0
#> 20 3 3.0000000 0 0 -0.05735603 0.0
## quadratic B-splines with one internal knot at "time = 1"
## (using function 'bSpline' from splines2 package)
simEvent(rho = splines2::bSpline, rhoCoef = c(0.8, 0.5, 1, 0.6),
arguments = list(rho = list(degree = 2, knots = 1,
intercept = TRUE,
Boundary.knots = c(0, 3))))
#> 'simEvent' S4 class object:
#> [1] 0.09089787 0.21678348 1.06873299 1.96199882
### frailty effect
## Gamma distribution with mean one
simEvent(z = c(0.5, 1), zCoef = c(1, 0), frailty = rgamma,
arguments = list(frailty = list(shape = 2, scale = 0.5)))
#> 'simEvent' S4 class object:
#> [1] 0.3961255 0.5721497 0.9267213 0.9505554 1.5040362 1.5561853
## lognormal with mean zero (on the log scale)
set.seed(123)
simEvent(z = c(0.5, 1), zCoef = c(1, 0), frailty = "rlnorm",
arguments = list(frailty = list(sdlog = 1)))
#> 'simEvent' S4 class object:
#> [1] 1.411910 1.445456 1.505172 1.841404 2.175221 2.329544
## or equivalently
set.seed(123)
logNorm <- function(a) exp(rnorm(n = 1, mean = 0, sd = a))
simEvent(z = c(0.5, 1), zCoef = c(1, 0), frailty = logNorm,
arguments = list(frailty = list(a = 1)))
#> 'simEvent' S4 class object:
#> [1] 1.411910 1.445456 1.505172 1.841404 2.175221 2.329544
### renewal process
## interarrival times following uniform distribution
rUnif <- function(n, rate, min) runif(n, min, max = 2 / rate)
simEvent(interarrival = rUnif,
arguments = list(interarrival = list(min = 0)))
#> 'simEvent' S4 class object:
#> [1] 1.311412 2.728473
## interarrival times following Gamma distribution with scale one
set.seed(123)
simEvent(interarrival = function(n, rate) rgamma(n, shape = 1 / rate))
#> 'simEvent' S4 class object:
#> [1] 0.1822171 1.8779682
## or equivalently
set.seed(123)
simEvent(interarrival = function(rate) rgamma(n = 1, shape = 1 / rate))
#> 'simEvent' S4 class object:
#> [1] 0.1822171 1.8779682
### relative risk functioin
set.seed(123)
simEvent(relativeRisk = "linear")
#> 'simEvent' S4 class object:
#> [1] 0.8434573 1.4200675 2.7491224 2.7806998 2.8369107
## or equivalently
rriskFun <- function(z, zCoef, intercept) {
as.numeric(z %*% zCoef) + intercept
}
set.seed(123)
simEvent(relativeRisk = rriskFun,
arguments = list(relativeRisk = list(intercept = 1)))
#> 'simEvent' S4 class object:
#> [1] 0.8434573 1.4200675 2.7491224 2.7806998 2.8369107