Introduction to Formula Response Function Recur()
Wenjie Wang
2022-07-09
Source:vignettes/reda-Recur.Rmd
reda-Recur.Rmd
library(reda)
packageVersion("reda")
## [1] '0.5.4'
Overview
The Recur()
function provides a flexible and widely applicable formula response interface for modeling recurrent event data with considerate data checking procedures. It combined the flexible interface of reSurv()
(deprecated in reReg version 1.1.7) and the effective checking procedures embedded in the Survr()
(deprecated in reda version 0.5.0).
Function Interface
The function interface of Recur()
is given below.
A high-level introduction to each argument is as follows:
-
time
: event and censoring times -
id
: subject’s id -
event
: recurrent event indicator, cost, or type -
terminal
: event indicator of terminal events -
origin
: time origin of subjects -
check
: how to run the data checking procedure-
"hard"
: throw errors if thecheck_Recur()
finds any issue on the data structure -
"soft"
: throw warnings instead -
"none"
: not to run the checking procedure
-
More details of arguments are provided in the function documentation by ?Recur
.
The Recur
Object
The function Recur()
returns an S4-class Recur
object representing model response for recurrent event data. The Recur
class object mainly contains a numerical matrix object (in the .Data
slot) that serves as a model response matrix. The other slots are
-
call
: a function call producing the object. -
ID
: a factor storing the original subject’s ID, which originally can be a character vector, a numeric vector, or a factor). It is needed to pinpoint data issues for particular subjects with their original ID’s. -
ord
: indices that sort the response matrix (by rows) increasingly byid
,time2
, and `- event`. Sorting is often done in the model-fitting steps, where the indices stored in this slot can be used directly. -
rev_ord
: indices that revert the increasingly sorted response matrix byord
to its original ordering. This slot is provided to easily revert the sorting. -
first_idx
: indices that indicates the first record of each subject in the sorted matrix. It helps in the data checking produce and may be helpful in model-fitting step, such as getting the origin time. -
last_idx
: indices that indicates the last record of each subject in the sorted matrix. Similar tofirst_idx
, it helps in the data checking produce and may be helpful in the model-fitting step, such as locating the terminal events. -
check
: a character string that records the specifiedcheck
argument. It just records the option that users specified on data checking.
Usage
Among all the arguments, only the argument time
does not have default values and thus has to be specified by users.
When only time
is given
- The function assumes that each time point is specified for each subject.
- The
id
takes its default value:seq_along(time)
. - The
event
takes its default values:0
(censoring) at the last record of each subject, and1
(event) before censoring. - Both
terminal
andorigin
take zero for all subjects by default.
time1 time2 id event terminal origin
[1,] 0 3 1 0 0 0
[2,] 0 4 2 0 0 0
[3,] 0 5 3 0 0 0
When time
and id
are given
- The
event
takes its default values:0
(censoring) at the last record of each subject, and1
(event) before censoring. - Both
terminal
andorigin
take zero for all subjects by default.
time1 time2 id event terminal origin
[1,] 4 6 1 0 0 0
[2,] 3 5 2 0 0 0
[3,] 2 4 1 1 0 0
[4,] 1 3 2 1 0 0
[5,] 0 2 1 1 0 0
[6,] 0 1 2 1 0 0
## sort by id, time2, and - event
head(ex2[ex2@ord, ])
time1 time2 id event terminal origin
[1,] 0 2 1 1 0 0
[2,] 2 4 1 1 0 0
[3,] 4 6 1 0 0 0
[4,] 0 1 2 1 0 0
[5,] 1 3 2 1 0 0
[6,] 3 5 2 0 0 0
- The slot
ord
stores the indices that sort the response matrix byid
,time2
, and `- event`.
Helper %to%
for recurrent episodes
The function Recur()
allows users to input recurrent episodes by time1
and time2
, which can be specified with help of %to%
(or its alias %2%
) in Recur()
. For example,
left <- c(1, 5, 7)
right <- c(3, 7, 9)
ex3 <- Recur(left %to% right, id = c("A1", "A1", "A2"))
head(ex3)
time1 time2 id event terminal origin
[1,] 1 3 1 1 0 1
[2,] 5 7 1 0 0 1
[3,] 7 9 2 0 0 7
Internally, the function %to%
returns a list with element named "time1"
and "time2"
. Therefore, it is equivalent to specify such a list.
About origin
and terminal
- Both
origin
andterminal
take a numeric vector. - The length of specified vector can be one, equal to the number of subjects, or the number of
time
. Some simple examples are given below.
time1 time2 id event terminal origin
[1,] 1 3 1 0 1 1
[2,] 1 4 2 0 1 1
[3,] 1 5 3 0 1 1
time1 time2 id event terminal origin
[1,] 1 3 1 1 0 1
[2,] 3 4 1 0 0 1
[3,] 2 5 2 0 1 2
ex7 <- Recur(3:5, id = c("A1", "A1", "A2"),
origin = c(1, 1, 2), terminal = c(0, 0, 1))
stopifnot(all.equal(ex6, ex7, check.attributes = FALSE))
- An error message will be thrown out if the length is inappropriate.
Error : Invalid length for 'origin'. See '?Recur' for details.
Error : Invalid length for 'terminal'. See '?Recur' for details.
Data Checking Rules
The Recur()
(internally calls check_Recur()
and) checks whether the specified data fits into the recurrent event data framework by several rules if check = "hard"
or check = "soft"
. The existing rules and the corresponding examples are given below.
- Every subject must have one censoring not before any event time.
Error : Subjects having events at or after censoring: A1.
- Every subject must have one terminal event time.
Error : Subjects having multiple terminal events: A1.
- Event or censoring times cannot be missing.
Error : Missing times! Please check subject: A1.
- Event times cannot be earlier than the origin time.
Error : Event times must be >= origin. Please check subject: A1.
Error : Event times must be >= origin. Please check subject: A1.
- The recurrent episode cannot be overlapped.
Error : Recurrent episodes cannot be overlapped. Please check subject: A1.
- However, recurrent episode without events is allowed for possible time-varying covariates and risk-free gaps.
[1] A1: (0, 1+], (2, 3], (6, 8+]
The Show()
Method
A show()
method is added for the Recur
object in a similar fashion to the output of the function survival:::print.Surv()
, which internally converts the input Recur
object to character strings representing the recurrent episodes by a dedicated as.character()
method. For each recurrent episode,
- Censoring not due to terminal is indicated by a trailing
+
sign; - Censoring due to terminal is indicated by a trailing
*
sign; - Otherwise, an event happens at the end of the recurrent episode.
For a concise printing, the show()
method takes the getOption("reda.Recur.maxPrint")
to limit the maximum number of recurrent episodes to be printed for each process. By default, options(reda.Recur.maxPrint = 3)
is set.
The Valve Seats Example
We may illustrate the results of the show()
method by the example valve seats data, where terminal events are artificially added.
set.seed(123)
term_events <- rbinom(length(unique(valveSeats$ID)), 1, 0.5)
with(valveSeats, Recur(Days, ID, No., term_events))
[1] 251: (0, 761+]
[2] 252: (0, 759*]
[3] 327: (0, 98], (98, 667*]
[4] 328: (0, 326], (326, 653], ..., (653, 667*]
[5] 329: (0, 665+]
[6] 330: (0, 84], (84, 667*]
[7] 331: (0, 87], (87, 663*]
[8] 389: (0, 646], (646, 653*]
[9] 390: (0, 92], (92, 653*]
[10] 391: (0, 651*]
[11] 392: (0, 258], (258, 328], ..., (621, 650+]
[12] 393: (0, 61], (61, 539], (539, 648*]
[13] 394: (0, 254], (254, 276], ..., (640, 644*]
[14] 395: (0, 76], (76, 538], (538, 642*]
[15] 396: (0, 635], (635, 641*]
[16] 397: (0, 349], (349, 404], ..., (561, 649+]
[17] 398: (0, 631*]
[18] 399: (0, 596*]
[19] 400: (0, 120], (120, 479], (479, 614+]
[20] 401: (0, 323], (323, 449], (449, 582+]
[21] 402: (0, 139], (139, 139], (139, 589*]
[22] 403: (0, 593+]
[23] 404: (0, 573], (573, 589*]
[24] 405: (0, 165], (165, 408], ..., (604, 606+]
[25] 406: (0, 249], (249, 594*]
[26] 407: (0, 344], (344, 497], (497, 613*]
[27] 408: (0, 265], (265, 586], (586, 595+]
[28] 409: (0, 166], (166, 206], ..., (348, 389+]
[29] 410: (0, 601+]
[30] 411: (0, 410], (410, 581], (581, 601+]
[31] 412: (0, 611+]
[32] 413: (0, 608+]
[33] 414: (0, 587*]
[34] 415: (0, 367], (367, 603+]
[35] 416: (0, 202], (202, 563], ..., (570, 585*]
[36] 417: (0, 587+]
[37] 418: (0, 578*]
[38] 419: (0, 578*]
[39] 420: (0, 586*]
[40] 421: (0, 585+]
[41] 422: (0, 582*]
On Missing times
The updated show()
method preserves NA
’s when check = "none"
. However, NA
’s will always appear because times are sorted internally.
[1] 1: (0, 4], (4, 6], (6, NA+] 2: (0, 3], (3, 5], (5, NA+]