For right-censored data, the function cox_cure() trains a Cox cure
rate model (Kuk and Chen, 1992; Sy and Taylor, 2000) via an expectation
maximization (EM) algorithm; For right-censored data with missing/uncertain
event/censoring indicators, the function fits the Cox cure rate model
proposed by Wang et al. (2023).
Usage
cox_cure(
surv_formula,
cure_formula,
time,
event,
data,
subset,
contrasts = NULL,
bootstrap = 0L,
surv_mstep = cox_cure.mstep(),
cure_mstep = cox_cure.mstep(),
control = cox_cure.control(),
...
)
cox_cure.fit(
surv_x,
cure_x,
time,
event,
cure_intercept = TRUE,
bootstrap = 0L,
surv_mstep = cox_cure.mstep(),
cure_mstep = cox_cure.mstep(),
control = cox_cure.control(),
...
)
cox_cure.control(
tail_completion = c("zero", "exp", "tau-zero"),
tail_tau = Inf,
maxit = 100,
epsilon = 1e-04,
pmin = 1e-05,
save_call = TRUE,
verbose = 0,
...
)
cox_cure.mstep(
start = NULL,
offset = NULL,
maxit = 10,
epsilon = 1e-04,
standardize = TRUE,
...
)Arguments
- surv_formula
A formula object starting with
~for the model formula in survival model part. For Cox model, no intercept term is included even if an intercept is specified or implied in the model formula. A model formula with an intercept term only is not allowed.- cure_formula
A formula object starting with
~for the model formula in incidence model part. For logistic model, an intercept term is included by default and can be excluded by adding+ 0or- 1to the model formula.- time
A numeric vector for the observed survival times.
- event
A numeric vector for the event indicators, where
NA's are allowed and represent uncertain event indicators.- data
An optional data frame, list, or environment that contains the model covariates and response variables (
timeandevent) If they are not found in data, the variables are taken from the environment of the specified formula, usually the environment from which this function is called.- subset
An optional logical vector specifying a subset of observations to be used in the fitting process.
- contrasts
An optional list, whose entries are values (numeric matrices or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors. See
contrasts.argofmodel.matrix.defaultfor details.- bootstrap
An integer representing the number of bootstrap samples for estimating standard errors of the coefficient estimates. The bootstrap procedure will not run if
bootstrap = 0by default. Ifbootstrap > 0, the specified number of bootstrap samples will be used for estimating the standard errors.- surv_mstep, cure_mstep
A named list passed to
cox_cure.mstep()specifying the control parameters for the corresponding M-steps.- control
A
cox_cure.controlobject that contains the control parameters.- ...
Other arguments passed to the control functions for backward compatibility.
- surv_x
A numeric matrix for the design matrix of the survival model component.
- cure_x
A numeric matrix for the design matrix of the cure rate model component. The design matrix should exclude an intercept term unless we want to fit a model only including the intercept term. In that case, we need further set
cure_intercept = FALSEto not standardize the intercept term.- cure_intercept
A logical value specifying whether to add an intercept term to the cure rate model component. If
TRUEby default, an intercept term is included.- tail_completion
A character string specifying the tail completion method for conditional survival function. The available methods are
"zero"for zero-tail completion after the largest event times (Sy and Taylor, 2000),"exp"for exponential-tail completion (Peng, 2003), and"tau-zero"for zero-tail completion after a specifiedtail_tau. The default method is the zero-tail completion proposed by Sy and Taylor (2000).- tail_tau
A numeric number specifying the time of zero-tail completion. It will be used only if
tail_completion = "tau-zero". A reasonable choice must be a time point between the largest event time and the largest survival time.- maxit
A positive integer specifying the maximum iteration number. The default value is
1000.- epsilon
A positive number specifying the tolerance that determines the convergence of the coefficient estimates. The tolerance is compared with the relative change between estimates from two consecutive iterations, which is measured by the ratio of the L1-norm of their difference to the sum of their L1-norms plus one.
- pmin
A positive number specifying the minimum value of probabilities for numerical stability. The default value is
1e-5.- save_call
A logical value indicating if the function call should be saved. For large datasets, saving the function call would increase the size of the returned object dramatically. We may want to set
save_call = FALSEif the original function call is not needed.- verbose
A nonnegative integer for verbose outputs, which is mainly useful for debugging.
- start
A numeric vector representing the initial values for the underlying model estimation procedure. If
standardizeisTRUE, the specified initial values will be scaled internally to match the standardized data. The default initial values depend on the specific models and based on the observed data. If inappropriate initial values (in terms of length) are specified, the default values will be used.- offset
A numeric vector specifying the offset term. The length of the specified offset term should be equal to the sample size.
- standardize
A logical value specifying if each covariate should be standardized to have mean zero and standard deviation one internally for numerically stability and fair regularization. The default value is
TRUE. The coefficient estimates will always be returned in original scales.
Value
A cox_cure object that contains the fitted ordinary Cox cure
rate model if none of the event indicators is NA. For
right-censored data with uncertain/missing event indicators, a
cox_cure_uncer object is returned.
References
Kuk, A. Y. C., & Chen, C. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika, 79(3), 531–541.
Peng, Y. (2003). Estimating baseline distribution in proportional hazards cure models. Computational Statistics & Data Analysis, 42(1-2), 187–201.
Sy, J. P., & Taylor, J. M. (2000). Estimation in a Cox proportional hazards cure model. Biometrics, 56(1), 227–236.
Wang, W., Luo, C., Aseltine, R. H., Wang, F., Yan, J., & Chen, K. (2023). Survival Modeling of Suicide Risk with Rare and Uncertain Diagnoses. Statistics in Biosciences, 17(1), 1–27.
Examples
library(intsurv)
### 1. Cox cure rate model
## simulate right-censored data with a cure fraction
set.seed(123)
n_obs <- 2e2
p <- 5
x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p)
colnames(x_mat) <- paste0("x", seq_len(p))
cure_beta <- rep(0.5, p)
b0 <- - 1
expit <- binomial()$linkinv
ncure_prob <- expit(as.numeric(b0 + x_mat %*% cure_beta))
is_cure <- 1 - rbinom(n_obs, size = 1, prob = ncure_prob)
surv_beta <- rep(0.5, p)
risk_score <- as.numeric(x_mat %*% surv_beta)
event_time <- rexp(n_obs, exp(as.numeric(x_mat %*% surv_beta)))
censor_time <- 10
event <- ifelse(event_time < censor_time & ! is_cure, 1, 0)
obs_time <- ifelse(event > 0, event_time, censor_time)
## model-fitting for the given design matrices using cox_cure.fit()
fit1 <- cox_cure.fit(x_mat, x_mat, obs_time, event, bootstrap = 30)
summary(fit1)
#> Call:
#> cox_cure.fit(surv_x = x_mat, cure_x = x_mat, time = obs_time,
#> event = event, bootstrap = 30)
#>
#> Coefficient estimates for the survival part:
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> x1 0.42342 1.52718 0.19042 2.2237 0.026171 *
#> x2 0.41447 1.51356 0.17478 2.3714 0.017723 *
#> x3 0.31211 1.36630 0.20307 1.5369 0.124312
#> x4 0.65472 1.92460 0.14546 4.5011 6.759e-06 ***
#> x5 0.53476 1.70704 0.18936 2.8241 0.004741 **
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Coefficient estimates for the cure rate part:
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> (Intercept) -0.82100 0.43999 0.10800 -7.6017 2.923e-14 ***
#> x1 0.27944 1.32239 0.23190 1.2050 0.2282003
#> x2 0.47635 1.61019 0.20540 2.3192 0.0203860 *
#> x3 0.44613 1.56225 0.18827 2.3696 0.0178052 *
#> x4 0.67781 1.96955 0.20098 3.3725 0.0007448 ***
#> x5 0.57691 1.78053 0.20997 2.7476 0.0060031 **
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Number of observations: 200
#> Number of events: 70
#> Weighted concordance index: 66.87%
#> Observed data log-likelihood: -1.955
## coefficient estimates from both model parts
coef(fit1)
#> $surv
#> x1 x2 x3 x4 x5
#> 0.4234220 0.4144655 0.3121074 0.6547156 0.5347593
#>
#> $cure
#> (Intercept) x1 x2 x3 x4 x5
#> -0.8210048 0.2794441 0.4763529 0.4461280 0.6778074 0.5769127
#>
## or a particular part
coef(fit1, "surv")
#> x1 x2 x3 x4 x5
#> 0.4234220 0.4144655 0.3121074 0.6547156 0.5347593
coef(fit1, "cure")
#> (Intercept) x1 x2 x3 x4 x5
#> -0.8210048 0.2794441 0.4763529 0.4461280 0.6778074 0.5769127
## create a toy example dataset
toy_dat <- data.frame(time = obs_time, status = event)
toy_dat$group <- cut(abs(x_mat[, 1L]), breaks = c(0, 0.5, 1, 2, Inf),
labels = LETTERS[1:4])
toy_dat <- cbind(toy_dat, as.data.frame(x_mat[, - 1L, drop = FALSE]))
## model-fitting for the given model formula using cox_cure()
fit2 <- cox_cure(
~ x3 + x4 + group,
~ group + x3 + offset(x2),
time = time,
event = status,
data = toy_dat,
subset = group != "D",
bootstrap = 30
)
summary(fit2)
#> Call:
#> cox_cure(surv_formula = ~x3 + x4 + group, cure_formula = ~group +
#> x3 + offset(x2), time = time, event = status, data = toy_dat,
#> subset = group != "D", bootstrap = 30)
#>
#> Coefficient estimates for the survival part:
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> x3 0.18339 1.20128 0.18365 0.9986 0.3180024
#> x4 0.70302 2.01985 0.20266 3.4690 0.0005224 ***
#> groupB -0.03514 0.96547 0.31802 -0.1105 0.9120161
#> groupC 0.19643 1.21705 0.29664 0.6622 0.5078494
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Coefficient estimates for the cure rate part:
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> (Intercept) -0.87548 0.41666 0.29772 -2.9406 0.003276 **
#> groupB 0.61884 1.85678 0.63116 0.9805 0.326846
#> groupC -0.40484 0.66708 0.38433 -1.0534 0.292168
#> x3 0.46988 1.59981 0.21499 2.1856 0.028843 *
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Number of observations: 192
#> Number of events: 66
#> Weighted concordance index: 66.65%
#> Observed data log-likelihood: -2.027
## get BIC's
BIC(fit1)
#> [1] 4.200453
BIC(fit2)
#> [1] 4.273196
BIC(fit1, fit2)
#> df BIC
#> fit1 11 4.200453
#> fit2 8 4.273196
### 2. Cox cure rate model for uncertain event status
set.seed(123)
n_obs <- 200
p <- 5
x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p)
colnames(x_mat) <- paste0("x", seq_len(p))
## simulate sample data
sim_dat <- simData4cure(nSubject = 200, max_censor = 10, lambda_censor = 0.1,
survMat = x_mat, cureMat = x_mat, b0 = 1)
table(sim_dat$case)
#>
#> 1 2a 2b 3a 3b 3c
#> 95 21 64 11 2 7
table(sim_dat$obs_event, useNA = "ifany")
#>
#> 0 1 <NA>
#> 85 95 20
## use formula
fit3 <- cox_cure(
~ x1 + x2 + x3,
~ z1 + z2 + z3,
time = obs_time,
event = obs_event,
data = sim_dat
)
summary(fit3)
#> Call:
#> cox_cure(surv_formula = ~x1 + x2 + x3, cure_formula = ~z1 + z2 +
#> z3, time = obs_time, event = obs_event, data = sim_dat)
#>
#> Coefficient estimates for the survival part:
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> x1 0.41639 1.51647 NA NA NA
#> x2 0.72573 2.06625 NA NA NA
#> x3 0.48005 1.61616 NA NA NA
#>
#> Coefficient estimates for the cure rate part:
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> (Intercept) 0.67726 1.96847 NA NA NA
#> z1 0.74369 2.10368 NA NA NA
#> z2 0.96486 2.62441 NA NA NA
#> z3 0.39020 1.47728 NA NA NA
#>
#> Number of observations: 200
#> Number of events: 95
#> Weighted concordance index: 70.48%
#> Observed data log-likelihood: -4.825
## use design matrix
fit4 <- cox_cure.fit(x_mat, x_mat,
time = sim_dat$obs_time,
event = sim_dat$obs_event)
summary(fit4)
#> Call:
#> cox_cure.fit(surv_x = x_mat, cure_x = x_mat, time = sim_dat$obs_time,
#> event = sim_dat$obs_event)
#>
#> Coefficient estimates for the survival part:
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> x1 1.07300 2.92414 NA NA NA
#> x2 1.35248 3.86701 NA NA NA
#> x3 1.00260 2.72536 NA NA NA
#> x4 1.09879 3.00055 NA NA NA
#> x5 0.99007 2.69142 NA NA NA
#>
#> Coefficient estimates for the cure rate part:
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> (Intercept) 1.04989 2.85732 NA NA NA
#> x1 1.11405 3.04667 NA NA NA
#> x2 1.02725 2.79337 NA NA NA
#> x3 0.66257 1.93978 NA NA NA
#> x4 1.00930 2.74367 NA NA NA
#> x5 1.19260 3.29563 NA NA NA
#>
#> Number of observations: 200
#> Number of events: 95
#> Weighted concordance index: 79.28%
#> Observed data log-likelihood: -4.496
## get BIC's
BIC(fit3, fit4)
#> df BIC
#> fit3 7 9.836434
#> fit4 11 9.282514