Cox Cure Rate Model

For right-censored data, fit a regular Cox cure rate model (Kuk and Chen, 1992; Sy and Taylor, 2000) via an EM algorithm. For right-censored data with uncertain event status, fit the Cox cure model proposed by Wang et al. (2020).

cox_cure(
  surv_formula,
  cure_formula,
  time,
  event,
  data,
  subset,
  contrasts = NULL,
  bootstrap = 0,
  firth = FALSE,
  surv_start = NULL,
  cure_start = NULL,
  surv_offset = NULL,
  cure_offset = NULL,
  em_max_iter = 200,
  em_rel_tol = 1e-05,
  surv_max_iter = 30,
  surv_rel_tol = 1e-05,
  cure_max_iter = 30,
  cure_rel_tol = 1e-05,
  tail_completion = c("zero", "exp", "zero-tau"),
  tail_tau = NULL,
  pmin = 1e-05,
  early_stop = TRUE,
  verbose = FALSE,
  ...
)

cox_cure.fit(
  surv_x,
  cure_x,
  time,
  event,
  cure_intercept = TRUE,
  bootstrap = 0,
  firth = FALSE,
  surv_start = NULL,
  cure_start = NULL,
  surv_offset = NULL,
  cure_offset = NULL,
  surv_standardize = TRUE,
  cure_standardize = TRUE,
  em_max_iter = 200,
  em_rel_tol = 1e-05,
  surv_max_iter = 30,
  surv_rel_tol = 1e-05,
  cure_max_iter = 30,
  cure_rel_tol = 1e-05,
  tail_completion = c("zero", "exp", "zero-tau"),
  tail_tau = NULL,
  pmin = 1e-05,
  early_stop = TRUE,
  verbose = FALSE,
  ...
)

Arguments

surv_formula	A formula object starting with `~` for the model formula in survival model part. For Cox model, no intercept term is included even if an intercept is specified or implied in the model formula. A model formula with an intercept term only is not allowed.
cure_formula	A formula object starting with `~` for the model formula in incidence model part. For logistic model, an intercept term is included by default and can be excluded by adding `+ 0` or `- 1` to the model formula.
time	A numeric vector for the observed survival times.
event	A numeric vector for the event indicators. `NA`'s are allowed and represent uncertain event status.
data	An optional data frame, list, or environment that contains the covariates and response variables (`time` and `event`), included in the model. If they are not found in data, the variables are taken from the environment of the specified formula, usually the environment from which this function is called.
subset	An optional logical vector specifying a subset of observations to be used in the fitting process.
contrasts	An optional list, whose entries are values (numeric matrices or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors. See `contrasts.arg` of `model.matrix.default` for details.
bootstrap	An integer representing the number of bootstrap samples for estimating standard errors of the coefficient estimates. The bootstrap procedure will not run with `bootstrap = 0` by default. If `bootstrap > 0`, the specified number of bootstrap samples will be used in estimating standard errors.
firth	A logical value indicating whether to use Firth's bias-reduction method (Firth, 1993) in the logistic model component. The default value is `FALSE` for fitting a regular logistic model. Notice that this argument is experimental and only available for regular Cox cure rate model currently.
surv_start, cure_start	An optional numeric vector representing the starting values for the survival model component or the incidence model component. If `surv_start = NULL` is specified, the starting values will be obtained from fitting a regular Cox to events only. Similarly, if `cure_start = NULL` is specified, the starting values will be obtained from fitting a regular logistic model to the non-missing event indicators.
surv_offset, cure_offset	An optional numeric vector representing the offset term in the survival model compoent or the incidence model component. The function will internally try to find values of the specified variable in the `data` first. Alternatively, one or more `offset` terms can be specified in the formula (by `stats::offset()`). If more than one offset terms are specified, their sum will be used.
em_max_iter	A positive integer specifying the maximum iteration number of the EM algorithm. The default value is `200`.
em_rel_tol	A positive number specifying the tolerance that determines the convergence of the EM algorithm in terms of the convergence of the covariate coefficient estimates. The tolerance is compared with the relative change between estimates from two consecutive iterations, which is measured by ratio of the L1-norm of their difference to the sum of their L1-norm. The default value is `1e-5`.
surv_max_iter, cure_max_iter	A positive integer specifying the maximum iteration number of the M-step routine related to the survival model component or the incidence model component. The default value is `200`.
surv_rel_tol, cure_rel_tol	A positive number specifying the tolerance that determines the convergence of the M-step related to the survival model component or the incidence model component in terms of the convergence of the covariate coefficient estimates. The tolerance is compared with the relative change between estimates from two consecutive iterations, which is measured by ratio of the L1-norm of their difference to the sum of their L1-norm. The default value is `1e-5`.
tail_completion	A character string specifying the tail completion method for conditional survival function. The available methods are `"zero"` for zero-tail completion after the largest event times (Sy and Taylor, 2000), `"exp"` for exponential-tail completion (Peng, 2003), and `"zero-tau"` for zero-tail completion after a specified `tail_tau`. The default method is the zero-tail completion proposed by Sy and Taylor (2000).
tail_tau	A numeric number specifying the time of zero-tail completion. It will be used only if `tail_completion = "zero-tau"`. A reasonable choice must be a time point between the largest event time and the largest survival time.
pmin	A numeric number specifying the minimum value of probabilities for sake of numerical stability. The default value is `1e-5`.
early_stop	A logical value specifying whether to stop the iteration once the negative log-likelihood unexpectedly increases, which may suggest convergence on likelihood, or indicate numerical issues or implementation bugs. The default value is `TRUE`.
verbose	A logical value. If `TRUE`, a verbose information will be given along iterations for tracing the convergence. The default value is `FALSE`.
...	Other arguments for future usage. A warning will be thrown if any invalid argument is specified.
surv_x	A numeric matrix for the design matrix of the survival model component.
cure_x	A numeric matrix for the design matrix of the cure rate model component. The design matrix should exclude an intercept term unless we want to fit a model only including the intercept term. In that case, we need further set `cure_intercept = FALSE` to not standardize the intercept term.
cure_intercept	A logical value specifying whether to add an intercept term to the cure rate model component. If `TRUE` by default, an intercept term is included.
surv_standardize	A logical value specifying whether to standardize the covariates for the survival model part. If `FALSE`, the covariates will be standardized internally to have mean zero and standard deviation one.
cure_standardize	A logical value specifying whether to standardize the covariates for the cure rate model part. If `TRUE` (by default), the covariates will be standardized internally to have mean zero and standard deviation one.

Value

cox_cure object for regular Cox cure rate model or cox_cure_uncer object for Cox cure rate model with uncertain events.

References

Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27--38.

Kuk, A. Y. C., & Chen, C. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika, 79(3), 531--541.

Peng, Y. (2003). Estimating baseline distribution in proportional hazards cure models. Computational Statistics & Data Analysis, 42(1-2), 187--201.

Sy, J. P., & Taylor, J. M. (2000). Estimation in a Cox proportional hazards cure model. Biometrics, 56(1), 227--236.

Wang, W., Luo, C., Aseltine, R. H., Wang, F., Yan, J., & Chen, K. (2020). Suicide Risk Modeling with Uncertain Diagnostic Records. arXiv preprint arXiv:2009.02597.

Examples

library(intsurv)

### regular Cox cure rate model ======================================
## 1. simulate right-censored data with a cure fraction
set.seed(123)
n_obs <- 2e2
p <- 5
x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p)
colnames(x_mat) <- paste0("x", seq_len(p))
cure_beta <- rep(0.5, p)
b0 <- - 1
expit <- binomial()$linkinv
ncure_prob <- expit(as.numeric(b0 + x_mat %*% cure_beta))
is_cure <- 1 - rbinom(n_obs, size = 1, prob = ncure_prob)
surv_beta <- rep(0.5, p)
risk_score <- as.numeric(x_mat %*% surv_beta)
event_time <- rexp(n_obs, exp(as.numeric(x_mat %*% surv_beta)))
censor_time <- 10
event <- ifelse(event_time < censor_time & ! is_cure, 1, 0)
obs_time <- ifelse(event > 0, event_time, censor_time)

## model-fitting from given design matrices
fit1 <- cox_cure.fit(x_mat, x_mat, obs_time, event, bootstrap = 30)
summary(fit1)
#> Call:
#> cox_cure.fit(surv_x = x_mat, cure_x = x_mat, time = obs_time, 
#>     event = event, bootstrap = 30)
#> 
#> Coefficient estimates from the survival part:
#> 
#>        coef exp(coef) se(coef)      z  Pr(>|z|)    
#> x1 0.424787  1.529264 0.148820 2.8544  0.004312 ** 
#> x2 0.415574  1.515241 0.251128 1.6548  0.097959 .  
#> x3 0.312479  1.366809 0.215197 1.4521  0.146484    
#> x4 0.656209  1.927471 0.107090 6.1276 8.919e-10 ***
#> x5 0.536194  1.709488 0.093791 5.7169 1.085e-08 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Coefficient estimates from the cure rate part:
#> 
#>                 coef exp(coef) se(coef)       z  Pr(>|z|)    
#> (Intercept) -0.82103   0.43998  0.11138 -7.3716 1.686e-13 ***
#> x1           0.27945   1.32241  0.15172  1.8419  0.065486 .  
#> x2           0.47637   1.61022  0.20779  2.2925  0.021876 *  
#> x3           0.44615   1.56228  0.19981  2.2329  0.025558 *  
#> x4           0.67783   1.96960  0.21637  3.1327  0.001732 ** 
#> x5           0.57693   1.78057  0.20864  2.7652  0.005688 ** 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Number of observations: 200
#> Number of events: 70
#> Weighted concordance index: 66.87%
#> Observed data log-likelihood: -390.904

## coefficient estimates from both model parts
coef(fit1)
#> $surv
#>        x1        x2        x3        x4        x5 
#> 0.4247867 0.4155744 0.3124786 0.6562086 0.5361937 
#> 
#> $cure
#> (Intercept)          x1          x2          x3          x4          x5 
#>  -0.8210297   0.2794521   0.4763700   0.4461459   0.6778322   0.5769342 
#> 

## or a particular part
coef(fit1, "surv")
#>        x1        x2        x3        x4        x5 
#> 0.4247867 0.4155744 0.3124786 0.6562086 0.5361937 
coef(fit1, "cure")
#> (Intercept)          x1          x2          x3          x4          x5 
#>  -0.8210297   0.2794521   0.4763700   0.4461459   0.6778322   0.5769342 

## weighted concordance index (C-index)
fit1$model$c_index
#> [1] 0.6686918
## which also can be computed as follows
cIndex(time = obs_time, event = event,
       risk_score = fit1$fitted$surv_xBeta,
       weight = ifelse(event > 0, 1, fit1$fitted$susceptible_prob))
#>        index   concordant   comparable    tied_risk 
#>    0.6686918 3322.8657119 4969.2033585    0.0000000 

## 2. create a toy dataset
toy_dat <- data.frame(time = obs_time, status = event)
toy_dat$group <- cut(abs(x_mat[, 1L]), breaks = c(0, 0.5, 1, 3, Inf),
                     labels = LETTERS[1:4])
toy_dat <- cbind(toy_dat, as.data.frame(x_mat[, - 1L, drop = FALSE]))

## model-fitting from given model formula
fit2 <- cox_cure(~ x3 + x4 + group, ~ group + x3 + offset(x2),
                 time = time, event = status, surv_offset = x2,
                 data = toy_dat, subset = group != "D", bootstrap = 30)
summary(fit2)
#> Call:
#> cox_cure(surv_formula = ~x3 + x4 + group, cure_formula = ~group + 
#>     x3 + offset(x2), time = time, event = status, data = toy_dat, 
#>     subset = group != "D", bootstrap = 30, surv_offset = x2)
#> 
#> Coefficient estimates from the survival part:
#> 
#>             coef exp(coef)  se(coef)       z  Pr(>|z|)    
#> x3      0.543309  1.721695  0.298974  1.8172 0.0691799 .  
#> x4      0.676139  1.966272  0.189392  3.5701 0.0003569 ***
#> groupB  0.370947  1.449106  0.467467  0.7935 0.4274721    
#> groupC -0.087894  0.915858  0.515648 -0.1705 0.8646542    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Coefficient estimates from the cure rate part:
#> 
#>                 coef exp(coef) se(coef)       z Pr(>|z|)   
#> (Intercept) -0.87410   0.41724  0.34504 -2.5333 0.011300 * 
#> groupB       0.61742   1.85414  0.44803  1.3781 0.168177   
#> groupC      -0.37159   0.68964  0.64444 -0.5766 0.564199   
#> x3           0.46356   1.58972  0.16147  2.8710 0.004092 **
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Number of observations: 199
#> Number of events: 69
#> Weighted concordance index: 65.19%
#> Observed data log-likelihood: -418.193

## get BIC's
BIC(fit1)
#> [1] 840.0902
BIC(fit2)
#> [1] 878.7316
BIC(fit1, fit2)
#>      df      BIC
#> fit1 11 840.0902
#> fit2  8 878.7316


### Cox cure rate model with uncertain event status ==================
## simulate sample data
sim_dat <- simData4cure(nSubject = 200, max_censor = 10, lambda_censor = 0.1,
                        survMat = x_mat, cureMat = x_mat, b0 = 1)
table(sim_dat$case)
#> 
#>  1 2a 2b 3a 3b 3c 
#> 98 20 67  7  4  4 
table(sim_dat$obs_event, useNA = "ifany")
#> 
#>    0    1 <NA> 
#>   87   98   15 

## use formula
fit3 <- cox_cure(~ x1 + x2 + x3, ~ z1 + z2 + z3,
                 time = obs_time, event = obs_event, data = sim_dat)
summary(fit3)
#> Call:
#> cox_cure(surv_formula = ~x1 + x2 + x3, cure_formula = ~z1 + z2 + 
#>     z3, time = obs_time, event = obs_event, data = sim_dat)
#> 
#> Coefficient estimates from the survival part:
#> 
#>       coef exp(coef) se(coef)  z Pr(>|z|)
#> x1 0.29775   1.34682       NA NA       NA
#> x2 0.50901   1.66365       NA NA       NA
#> x3 0.44927   1.56716       NA NA       NA
#> 
#> Coefficient estimates from the cure rate part:
#> 
#>                coef exp(coef) se(coef)  z Pr(>|z|)
#> (Intercept) 0.58373   1.79272       NA NA       NA
#> z1          0.87483   2.39846       NA NA       NA
#> z2          0.94459   2.57176       NA NA       NA
#> z3          0.79278   2.20953       NA NA       NA
#> 
#> Number of observations: 200
#> Number of events: 98
#> Weighted concordance index: 66.23%
#> Observed data log-likelihood: -964.619

## use design matrix
fit4 <- cox_cure.fit(x_mat, x_mat, time = sim_dat$obs_time,
                     event = sim_dat$obs_event)
summary(fit4)
#> Call:
#> cox_cure.fit(surv_x = x_mat, cure_x = x_mat, time = sim_dat$obs_time, 
#>     event = sim_dat$obs_event)
#> 
#> Coefficient estimates from the survival part:
#> 
#>       coef exp(coef) se(coef)  z Pr(>|z|)
#> x1 0.78599   2.19457       NA NA       NA
#> x2 0.89238   2.44094       NA NA       NA
#> x3 0.90198   2.46447       NA NA       NA
#> x4 0.78929   2.20182       NA NA       NA
#> x5 0.96938   2.63631       NA NA       NA
#> 
#> Coefficient estimates from the cure rate part:
#> 
#>                coef exp(coef) se(coef)  z Pr(>|z|)
#> (Intercept) 0.80575   2.23837       NA NA       NA
#> x1          1.17734   3.24574       NA NA       NA
#> x2          1.07222   2.92187       NA NA       NA
#> x3          1.11155   3.03908       NA NA       NA
#> x4          0.82729   2.28711       NA NA       NA
#> x5          1.16752   3.21400       NA NA       NA
#> 
#> Number of observations: 200
#> Number of events: 98
#> Weighted concordance index: 75.74%
#> Observed data log-likelihood: -906.926

## get BIC's
BIC(fit3, fit4)
#>      df      BIC
#> fit3  7 1966.326
#> fit4 11 1872.133

Arguments

Value

References

See also

Examples