For right-censored data, fit a regular Cox cure rate model (Kuk and Chen, 1992; Sy and Taylor, 2000) via an EM algorithm. For right-censored data with uncertain event status, fit the Cox cure model proposed by Wang et al. (2020).
cox_cure( surv_formula, cure_formula, time, event, data, subset, contrasts = NULL, bootstrap = 0, firth = FALSE, surv_start = NULL, cure_start = NULL, surv_offset = NULL, cure_offset = NULL, em_max_iter = 200, em_rel_tol = 1e-05, surv_max_iter = 30, surv_rel_tol = 1e-05, cure_max_iter = 30, cure_rel_tol = 1e-05, tail_completion = c("zero", "exp", "zero-tau"), tail_tau = NULL, pmin = 1e-05, early_stop = TRUE, verbose = FALSE, ... ) cox_cure.fit( surv_x, cure_x, time, event, cure_intercept = TRUE, bootstrap = 0, firth = FALSE, surv_start = NULL, cure_start = NULL, surv_offset = NULL, cure_offset = NULL, surv_standardize = TRUE, cure_standardize = TRUE, em_max_iter = 200, em_rel_tol = 1e-05, surv_max_iter = 30, surv_rel_tol = 1e-05, cure_max_iter = 30, cure_rel_tol = 1e-05, tail_completion = c("zero", "exp", "zero-tau"), tail_tau = NULL, pmin = 1e-05, early_stop = TRUE, verbose = FALSE, ... )
surv_formula | A formula object starting with |
---|---|
cure_formula | A formula object starting with |
time | A numeric vector for the observed survival times. |
event | A numeric vector for the event indicators. |
data | An optional data frame, list, or environment that contains the
covariates and response variables ( |
subset | An optional logical vector specifying a subset of observations to be used in the fitting process. |
contrasts | An optional list, whose entries are values (numeric
matrices or character strings naming functions) to be used as
replacement values for the contrasts replacement function and whose
names are the names of columns of data containing factors. See
|
bootstrap | An integer representing the number of bootstrap samples for
estimating standard errors of the coefficient estimates. The bootstrap
procedure will not run with |
firth | A logical value indicating whether to use Firth's
bias-reduction method (Firth, 1993) in the logistic model component.
The default value is |
surv_start, cure_start | An optional numeric vector representing the
starting values for the survival model component or the incidence model
component. If |
surv_offset, cure_offset | An optional numeric vector representing the
offset term in the survival model compoent or the incidence model
component. The function will internally try to find values of the
specified variable in the |
em_max_iter | A positive integer specifying the maximum iteration
number of the EM algorithm. The default value is |
em_rel_tol | A positive number specifying the tolerance that determines
the convergence of the EM algorithm in terms of the convergence of the
covariate coefficient estimates. The tolerance is compared with the
relative change between estimates from two consecutive iterations, which
is measured by ratio of the L1-norm of their difference to the sum of
their L1-norm. The default value is |
surv_max_iter, cure_max_iter | A positive integer specifying the maximum
iteration number of the M-step routine related to the survival model
component or the incidence model component. The default value is
|
surv_rel_tol, cure_rel_tol | A positive number specifying the tolerance
that determines the convergence of the M-step related to the survival
model component or the incidence model component in terms of the
convergence of the covariate coefficient estimates. The tolerance is
compared with the relative change between estimates from two consecutive
iterations, which is measured by ratio of the L1-norm of their
difference to the sum of their L1-norm. The default value is
|
tail_completion | A character string specifying the tail completion
method for conditional survival function. The available methods are
|
tail_tau | A numeric number specifying the time of zero-tail
completion. It will be used only if |
pmin | A numeric number specifying the minimum value of probabilities
for sake of numerical stability. The default value is |
early_stop | A logical value specifying whether to stop the iteration
once the negative log-likelihood unexpectedly increases, which may
suggest convergence on likelihood, or indicate numerical issues or
implementation bugs. The default value is |
verbose | A logical value. If |
... | Other arguments for future usage. A warning will be thrown if any invalid argument is specified. |
surv_x | A numeric matrix for the design matrix of the survival model component. |
cure_x | A numeric matrix for the design matrix of the cure rate model
component. The design matrix should exclude an intercept term unless we
want to fit a model only including the intercept term. In that case, we
need further set |
cure_intercept | A logical value specifying whether to add an intercept
term to the cure rate model component. If |
surv_standardize | A logical value specifying whether to standardize
the covariates for the survival model part. If |
cure_standardize | A logical value specifying whether to standardize
the covariates for the cure rate model part. If |
cox_cure
object for regular Cox cure rate model or
cox_cure_uncer
object for Cox cure rate model with uncertain events.
Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27--38.
Kuk, A. Y. C., & Chen, C. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika, 79(3), 531--541.
Peng, Y. (2003). Estimating baseline distribution in proportional hazards cure models. Computational Statistics & Data Analysis, 42(1-2), 187--201.
Sy, J. P., & Taylor, J. M. (2000). Estimation in a Cox proportional hazards cure model. Biometrics, 56(1), 227--236.
Wang, W., Luo, C., Aseltine, R. H., Wang, F., Yan, J., & Chen, K. (2020). Suicide Risk Modeling with Uncertain Diagnostic Records. arXiv preprint arXiv:2009.02597.
cox_cure_net
for regularized Cox cure rate model with
elastic-net penalty.
library(intsurv) ### regular Cox cure rate model ====================================== ## 1. simulate right-censored data with a cure fraction set.seed(123) n_obs <- 2e2 p <- 5 x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p) colnames(x_mat) <- paste0("x", seq_len(p)) cure_beta <- rep(0.5, p) b0 <- - 1 expit <- binomial()$linkinv ncure_prob <- expit(as.numeric(b0 + x_mat %*% cure_beta)) is_cure <- 1 - rbinom(n_obs, size = 1, prob = ncure_prob) surv_beta <- rep(0.5, p) risk_score <- as.numeric(x_mat %*% surv_beta) event_time <- rexp(n_obs, exp(as.numeric(x_mat %*% surv_beta))) censor_time <- 10 event <- ifelse(event_time < censor_time & ! is_cure, 1, 0) obs_time <- ifelse(event > 0, event_time, censor_time) ## model-fitting from given design matrices fit1 <- cox_cure.fit(x_mat, x_mat, obs_time, event, bootstrap = 30) summary(fit1)#> Call: #> cox_cure.fit(surv_x = x_mat, cure_x = x_mat, time = obs_time, #> event = event, bootstrap = 30) #> #> Coefficient estimates from the survival part: #> #> coef exp(coef) se(coef) z Pr(>|z|) #> x1 0.424787 1.529264 0.148820 2.8544 0.004312 ** #> x2 0.415574 1.515241 0.251128 1.6548 0.097959 . #> x3 0.312479 1.366809 0.215197 1.4521 0.146484 #> x4 0.656209 1.927471 0.107090 6.1276 8.919e-10 *** #> x5 0.536194 1.709488 0.093791 5.7169 1.085e-08 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Coefficient estimates from the cure rate part: #> #> coef exp(coef) se(coef) z Pr(>|z|) #> (Intercept) -0.82103 0.43998 0.11138 -7.3716 1.686e-13 *** #> x1 0.27945 1.32241 0.15172 1.8419 0.065486 . #> x2 0.47637 1.61022 0.20779 2.2925 0.021876 * #> x3 0.44615 1.56228 0.19981 2.2329 0.025558 * #> x4 0.67783 1.96960 0.21637 3.1327 0.001732 ** #> x5 0.57693 1.78057 0.20864 2.7652 0.005688 ** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Number of observations: 200 #> Number of events: 70 #> Weighted concordance index: 66.87% #> Observed data log-likelihood: -390.904#> $surv #> x1 x2 x3 x4 x5 #> 0.4247867 0.4155744 0.3124786 0.6562086 0.5361937 #> #> $cure #> (Intercept) x1 x2 x3 x4 x5 #> -0.8210297 0.2794521 0.4763700 0.4461459 0.6778322 0.5769342 #>#> x1 x2 x3 x4 x5 #> 0.4247867 0.4155744 0.3124786 0.6562086 0.5361937#> (Intercept) x1 x2 x3 x4 x5 #> -0.8210297 0.2794521 0.4763700 0.4461459 0.6778322 0.5769342## weighted concordance index (C-index) fit1$model$c_index#> [1] 0.6686918## which also can be computed as follows cIndex(time = obs_time, event = event, risk_score = fit1$fitted$surv_xBeta, weight = ifelse(event > 0, 1, fit1$fitted$susceptible_prob))#> index concordant comparable tied_risk #> 0.6686918 3322.8657119 4969.2033585 0.0000000## 2. create a toy dataset toy_dat <- data.frame(time = obs_time, status = event) toy_dat$group <- cut(abs(x_mat[, 1L]), breaks = c(0, 0.5, 1, 3, Inf), labels = LETTERS[1:4]) toy_dat <- cbind(toy_dat, as.data.frame(x_mat[, - 1L, drop = FALSE])) ## model-fitting from given model formula fit2 <- cox_cure(~ x3 + x4 + group, ~ group + x3 + offset(x2), time = time, event = status, surv_offset = x2, data = toy_dat, subset = group != "D", bootstrap = 30) summary(fit2)#> Call: #> cox_cure(surv_formula = ~x3 + x4 + group, cure_formula = ~group + #> x3 + offset(x2), time = time, event = status, data = toy_dat, #> subset = group != "D", bootstrap = 30, surv_offset = x2) #> #> Coefficient estimates from the survival part: #> #> coef exp(coef) se(coef) z Pr(>|z|) #> x3 0.543309 1.721695 0.298974 1.8172 0.0691799 . #> x4 0.676139 1.966272 0.189392 3.5701 0.0003569 *** #> groupB 0.370947 1.449106 0.467467 0.7935 0.4274721 #> groupC -0.087894 0.915858 0.515648 -0.1705 0.8646542 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Coefficient estimates from the cure rate part: #> #> coef exp(coef) se(coef) z Pr(>|z|) #> (Intercept) -0.87410 0.41724 0.34504 -2.5333 0.011300 * #> groupB 0.61742 1.85414 0.44803 1.3781 0.168177 #> groupC -0.37159 0.68964 0.64444 -0.5766 0.564199 #> x3 0.46356 1.58972 0.16147 2.8710 0.004092 ** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Number of observations: 199 #> Number of events: 69 #> Weighted concordance index: 65.19% #> Observed data log-likelihood: -418.193#> [1] 840.0902#> [1] 878.7316#> df BIC #> fit1 11 840.0902 #> fit2 8 878.7316### Cox cure rate model with uncertain event status ================== ## simulate sample data sim_dat <- simData4cure(nSubject = 200, max_censor = 10, lambda_censor = 0.1, survMat = x_mat, cureMat = x_mat, b0 = 1) table(sim_dat$case)#> #> 1 2a 2b 3a 3b 3c #> 98 20 67 7 4 4#> #> 0 1 <NA> #> 87 98 15## use formula fit3 <- cox_cure(~ x1 + x2 + x3, ~ z1 + z2 + z3, time = obs_time, event = obs_event, data = sim_dat) summary(fit3)#> Call: #> cox_cure(surv_formula = ~x1 + x2 + x3, cure_formula = ~z1 + z2 + #> z3, time = obs_time, event = obs_event, data = sim_dat) #> #> Coefficient estimates from the survival part: #> #> coef exp(coef) se(coef) z Pr(>|z|) #> x1 0.29775 1.34682 NA NA NA #> x2 0.50901 1.66365 NA NA NA #> x3 0.44927 1.56716 NA NA NA #> #> Coefficient estimates from the cure rate part: #> #> coef exp(coef) se(coef) z Pr(>|z|) #> (Intercept) 0.58373 1.79272 NA NA NA #> z1 0.87483 2.39846 NA NA NA #> z2 0.94459 2.57176 NA NA NA #> z3 0.79278 2.20953 NA NA NA #> #> Number of observations: 200 #> Number of events: 98 #> Weighted concordance index: 66.23% #> Observed data log-likelihood: -964.619## use design matrix fit4 <- cox_cure.fit(x_mat, x_mat, time = sim_dat$obs_time, event = sim_dat$obs_event) summary(fit4)#> Call: #> cox_cure.fit(surv_x = x_mat, cure_x = x_mat, time = sim_dat$obs_time, #> event = sim_dat$obs_event) #> #> Coefficient estimates from the survival part: #> #> coef exp(coef) se(coef) z Pr(>|z|) #> x1 0.78599 2.19457 NA NA NA #> x2 0.89238 2.44094 NA NA NA #> x3 0.90198 2.46447 NA NA NA #> x4 0.78929 2.20182 NA NA NA #> x5 0.96938 2.63631 NA NA NA #> #> Coefficient estimates from the cure rate part: #> #> coef exp(coef) se(coef) z Pr(>|z|) #> (Intercept) 0.80575 2.23837 NA NA NA #> x1 1.17734 3.24574 NA NA NA #> x2 1.07222 2.92187 NA NA NA #> x3 1.11155 3.03908 NA NA NA #> x4 0.82729 2.28711 NA NA NA #> x5 1.16752 3.21400 NA NA NA #> #> Number of observations: 200 #> Number of events: 98 #> Weighted concordance index: 75.74% #> Observed data log-likelihood: -906.926#> df BIC #> fit3 7 1966.326 #> fit4 11 1872.133