For right-censored data, the function cox_cure_net() trains a
regularized Cox cure rate model with elastic-net penalty following Masud et
al. (2018), and Zou and Hastie (2005). For right-censored data with
missing/uncertain event/censoring indicators, it fits the Cox cure rate
model proposed by Wang et al. (2023).
Usage
cox_cure_net(
surv_formula,
cure_formula,
time,
event,
data,
subset,
contrasts = NULL,
cv_nfolds = 0L,
surv_net = cox_cure_net.penalty(),
cure_net = cox_cure_net.penalty(),
surv_mstep = cox_cure.mstep(),
cure_mstep = cox_cure.mstep(),
control = cox_cure.control(),
...
)
cox_cure_net.fit(
surv_x,
cure_x,
time,
event,
cure_intercept = TRUE,
cv_nfolds = 0L,
surv_net = cox_cure_net.penalty(),
cure_net = cox_cure_net.penalty(),
surv_mstep = cox_cure.mstep(),
cure_mstep = cox_cure.mstep(),
control = cox_cure.control(),
...
)
cox_cure_net.penalty(
nlambda = 10,
lambda_min_ratio = 0.001,
alpha = 1,
lambda = NULL,
penalty_factor = NULL,
varying_active = TRUE,
...
)Arguments
- surv_formula
A formula object starting with
~for the model formula in survival model part. For Cox model, no intercept term is included even if an intercept is specified or implied in the model formula. A model formula with an intercept term only is not allowed.- cure_formula
A formula object starting with
~for the model formula in incidence model part. For logistic model, an intercept term is included by default and can be excluded by adding+ 0or- 1to the model formula.- time
A numeric vector for the observed survival times.
- event
A numeric vector for the event indicators, where
NA's are allowed and represent uncertain event indicators.- data
An optional data frame, list, or environment that contains the model covariates and response variables (
timeandevent) If they are not found in data, the variables are taken from the environment of the specified formula, usually the environment from which this function is called.- subset
An optional logical vector specifying a subset of observations to be used in the fitting process.
- contrasts
An optional list, whose entries are values (numeric matrices or character strings naming functions) to be used as replacement values for the contrasts replacement function and whose names are the names of columns of data containing factors. See
contrasts.argofmodel.matrix.defaultfor details.- cv_nfolds
A nonnegative integer representing the number of folds in cross-validation.
- surv_net, cure_net
Optional lists or
cox_cure_net.penaltyobjects specifying the elastic penalties for survival model part and cure rate model part, respectively.- surv_mstep, cure_mstep
A named list passed to
cox_cure.mstep()specifying the control parameters for the corresponding M-steps.- control
A
cox_cure.controlobject that contains the control parameters.- ...
Other arguments passed to the control functions for backward compatibility.
- surv_x
A numeric matrix for the design matrix of the survival model component.
- cure_x
A numeric matrix for the design matrix of the cure rate model component. The design matrix should exclude an intercept term unless we want to fit a model only including the intercept term. In that case, we need further set
cure_intercept = FALSEto not standardize the intercept term.- cure_intercept
A logical value specifying whether to add an intercept term to the cure rate model component. If
TRUEby default, an intercept term is included.- nlambda
A positive integer representing the number of lambda parameters.
- lambda_min_ratio
A positive number specifying the ratio between the smallest lambda in the solution path to the large enough lambda that would result in all zero estimates with the lasso penalty.
- alpha
A positive number between 0 and 1 representing the mixing parameter in the elastic net penalty.
- lambda
A numeric vector that consists of nonnegative values representing the sequence of the lambda parameters.
- penalty_factor
A numeric vector that consists of nonnegative penalty factors (or adaptive weights) for the L1-norm of the coefficient estimates.
- varying_active
A logical value. If
TRUE(by default), the underlying coordinate-descent algorithm will be iterated over varying active sets, which can usually improve the computational efficiency when the number of predictors is large. Otherwise, an ordinary coordinate-descent will be performed.
Value
A cox_cure or cox_cure_net object that contains the
fitted ordinary or regularized Cox cure rate model if none of the event
indicators is NA. For right-censored data with uncertain/missing
event indicators, a cox_cure_uncer or cox_cure_net_uncer
is returned.
Details
The model estimation procedure follows expectation maximization (EM) algorithm. Variable selection procedure through regularization by elastic net penalty is developed based on cyclic coordinate descent and majorization-minimization (MM) algorithm.
References
Kuk, A. Y. C., & Chen, C. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika, 79(3), 531–541.
Peng, Y. (2003). Estimating baseline distribution in proportional hazards cure models. Computational Statistics & Data Analysis, 42(1-2), 187–201.
Sy, J. P., & Taylor, J. M. (2000). Estimation in a Cox proportional hazards cure model. Biometrics, 56(1), 227–236.
Masud, A., Tu, W., & Yu, Z. (2018). Variable selection for mixture and promotion time cure rate models. Statistical methods in medical research, 27(7), 2185–2199.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.
Wang, W., Luo, C., Aseltine, R. H., Wang, F., Yan, J., & Chen, K. (2023). Survival Modeling of Suicide Risk with Rare and Uncertain Diagnoses. Statistics in Biosciences, 17(1), 1–27.
Examples
library(intsurv)
### 1. Regularized Cox cure rate model with elastic-net penalty
## simulate a toy right-censored data with a cure fraction
set.seed(123)
n_obs <- 100
p <- 10
x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p)
colnames(x_mat) <- paste0("x", seq_len(p))
surv_beta <- c(rep(0, p - 5), rep(1, 5))
cure_beta <- c(rep(1, 2), rep(0, p - 2))
dat <- simData4cure(nSubject = n_obs, lambda_censor = 0.01,
max_censor = 10, survMat = x_mat,
survCoef = surv_beta, cureCoef = cure_beta,
b0 = 0.5, p1 = 1, p2 = 1, p3 = 1)
## model-fitting for the given design matrices
fit1 <- cox_cure_net.fit(x_mat, x_mat, dat$obs_time, dat$obs_event,
surv_net = list(nlambda = 10, alpha = 1),
cure_net = list(nlambda = 10, alpha = 0.8))
## model-fitting for the given model formula
fm <- paste(paste0("x", seq_len(p)), collapse = " + ")
surv_fm <- as.formula(sprintf("~ %s", fm))
cure_fm <- surv_fm
fit2 <- cox_cure_net(surv_fm, cure_fm, data = dat,
time = obs_time, event = obs_event)
## summary of BIC's
BIC(fit1)
#> df surv_df cure_df BIC
#> 1 2 0 2 5.716568
#> 2 4 0 4 5.670483
#> 3 9 0 9 5.780108
#> 4 11 0 11 5.803649
#> 5 11 0 11 5.779392
#> 6 11 0 11 5.772566
#> 7 11 0 11 5.770850
#> 8 11 0 11 5.770450
#> 9 11 0 11 5.770360
#> 10 11 0 11 5.770341
#> 11 6 4 2 5.695470
#> 12 8 4 4 5.639227
#> 13 12 4 8 5.706908
#> 14 15 4 11 5.779325
#> 15 15 4 11 5.755260
#> 16 15 4 11 5.748197
#> 17 15 4 11 5.746251
#> 18 15 4 11 5.745710
#> 19 15 4 11 5.745547
#> 20 15 4 11 5.745515
#> 21 9 7 2 5.475084
#> 22 11 7 4 5.426858
#> 23 16 7 9 5.543245
#> 24 18 7 11 5.568937
#> 25 18 7 11 5.545176
#> 26 18 7 11 5.538557
#> 27 18 7 11 5.537025
#> 28 18 7 11 5.536719
#> 29 18 7 11 5.536629
#> 30 18 7 11 5.536623
#> 31 12 10 2 5.479763
#> 32 14 10 4 5.433034
#> 33 19 10 9 5.551601
#> 34 21 10 11 5.577537
#> 35 21 10 11 5.553734
#> 36 21 10 11 5.547167
#> 37 21 10 11 5.545607
#> 38 21 10 11 5.545301
#> 39 21 10 11 5.545262
#> 40 21 10 11 5.545238
#> 41 12 10 2 5.436670
#> 42 13 10 3 5.344493
#> 43 19 10 9 5.510330
#> 44 21 10 11 5.536492
#> 45 21 10 11 5.512653
#> 46 21 10 11 5.506022
#> 47 21 10 11 5.504419
#> 48 21 10 11 5.504095
#> 49 21 10 11 5.504038
#> 50 21 10 11 5.504049
#> 51 12 10 2 5.425406
#> 52 13 10 3 5.333659
#> 53 19 10 9 5.500162
#> 54 21 10 11 5.526388
#> 55 21 10 11 5.502482
#> 56 21 10 11 5.495782
#> 57 21 10 11 5.494127
#> 58 21 10 11 5.493764
#> 59 21 10 11 5.493698
#> 60 21 10 11 5.493680
#> 61 12 10 2 5.422601
#> 62 13 10 3 5.331118
#> 63 19 10 9 5.497955
#> 64 21 10 11 5.524189
#> 65 21 10 11 5.500232
#> 66 21 10 11 5.493488
#> 67 21 10 11 5.491798
#> 68 21 10 11 5.491414
#> 69 21 10 11 5.491336
#> 70 21 10 11 5.491320
#> 71 12 10 2 5.421894
#> 72 13 10 3 5.330552
#> 73 19 10 9 5.497551
#> 74 21 10 11 5.523790
#> 75 21 10 11 5.499802
#> 76 21 10 11 5.493028
#> 77 21 10 11 5.491324
#> 78 21 10 11 5.490928
#> 79 21 10 11 5.490842
#> 80 21 10 11 5.490825
#> 81 12 10 2 5.421682
#> 82 13 10 3 5.330427
#> 83 19 10 9 5.497507
#> 84 21 10 11 5.523747
#> 85 21 10 11 5.499743
#> 86 21 10 11 5.492955
#> 87 21 10 11 5.491242
#> 88 21 10 11 5.490842
#> 89 21 10 11 5.490753
#> 90 21 10 11 5.490733
#> 91 12 10 2 5.421613
#> 92 13 10 3 5.330404
#> 93 19 10 9 5.497519
#> 94 21 10 11 5.523759
#> 95 21 10 11 5.499747
#> 96 21 10 11 5.492953
#> 97 21 10 11 5.491235
#> 98 21 10 11 5.490833
#> 99 21 10 11 5.490742
#> 100 21 10 11 5.490722
BIC(fit2)
#> df surv_df cure_df BIC
#> 1 2 0 2 5.715783
#> 2 4 0 4 5.661199
#> 3 9 0 9 5.771416
#> 4 11 0 11 5.798918
#> 5 11 0 11 5.777662
#> 6 11 0 11 5.772071
#> 7 11 0 11 5.770727
#> 8 11 0 11 5.770422
#> 9 11 0 11 5.770354
#> 10 11 0 11 5.770340
#> 11 6 4 2 5.694309
#> 12 8 4 4 5.628845
#> 13 13 4 9 5.743627
#> 14 15 4 11 5.774010
#> 15 15 4 11 5.753124
#> 16 15 4 11 5.747462
#> 17 15 4 11 5.745987
#> 18 15 4 11 5.745635
#> 19 15 4 11 5.745531
#> 20 15 4 11 5.745498
#> 21 9 7 2 5.472693
#> 22 11 7 4 5.416726
#> 23 16 7 9 5.534702
#> 24 18 7 11 5.564000
#> 25 18 7 11 5.543325
#> 26 18 7 11 5.538027
#> 27 18 7 11 5.536881
#> 28 18 7 11 5.536680
#> 29 18 7 11 5.536617
#> 30 18 7 11 5.536665
#> 31 12 10 2 5.477080
#> 32 14 10 4 5.422979
#> 33 19 10 9 5.543236
#> 34 21 10 11 5.572717
#> 35 21 10 11 5.551966
#> 36 21 10 11 5.546665
#> 37 21 10 11 5.545486
#> 38 21 10 11 5.545276
#> 39 21 10 11 5.545258
#> 40 21 10 11 5.545238
#> 41 12 10 2 5.433876
#> 42 13 10 3 5.334487
#> 43 19 10 9 5.502040
#> 44 21 10 11 5.531734
#> 45 21 10 11 5.510914
#> 46 21 10 11 5.505532
#> 47 21 10 11 5.504304
#> 48 21 10 11 5.504073
#> 49 21 10 11 5.504035
#> 50 21 10 11 5.504049
#> 51 12 10 2 5.422565
#> 52 13 10 3 5.323654
#> 53 18 10 8 5.445854
#> 54 21 10 11 5.521628
#> 55 21 10 11 5.500742
#> 56 21 10 11 5.495288
#> 57 21 10 11 5.494008
#> 58 21 10 11 5.493739
#> 59 21 10 11 5.493694
#> 60 21 10 11 5.493680
#> 61 12 10 2 5.419739
#> 62 13 10 3 5.321113
#> 63 18 10 8 5.443588
#> 64 21 10 11 5.519445
#> 65 21 10 11 5.498493
#> 66 21 10 11 5.492986
#> 67 21 10 11 5.491674
#> 68 21 10 11 5.491387
#> 69 21 10 11 5.491330
#> 70 21 10 11 5.491319
#> 71 12 10 2 5.419024
#> 72 13 10 3 5.320548
#> 73 18 10 8 5.443157
#> 74 21 10 11 5.519045
#> 75 21 10 11 5.498061
#> 76 21 10 11 5.492528
#> 77 21 10 11 5.491198
#> 78 21 10 11 5.490899
#> 79 21 10 11 5.490836
#> 80 21 10 11 5.490824
#> 81 12 10 2 5.418807
#> 82 13 10 3 5.320423
#> 83 18 10 8 5.443100
#> 84 21 10 11 5.519004
#> 85 21 10 11 5.498001
#> 86 21 10 11 5.492454
#> 87 21 10 11 5.491117
#> 88 21 10 11 5.490813
#> 89 21 10 11 5.490746
#> 90 21 10 11 5.490732
#> 91 12 10 2 5.418734
#> 92 13 10 3 5.320400
#> 93 18 10 8 5.443106
#> 94 21 10 11 5.519016
#> 95 21 10 11 5.498005
#> 96 21 10 11 5.492452
#> 97 21 10 11 5.491110
#> 98 21 10 11 5.490804
#> 99 21 10 11 5.490736
#> 100 21 10 11 5.490721
BIC(fit1)[which.min(BIC(fit1)[, "BIC"]), ]
#> df surv_df cure_df BIC
#> 92 13 10 3 5.330404
BIC(fit2)[which.min(BIC(fit2)[, "BIC"]), ]
#> df surv_df cure_df BIC
#> 92 13 10 3 5.3204
## list of coefficient estimates based on BIC
coef(fit1)
#> $surv
#> x1 x2 x3 x4 x5 x6 x7
#> 0.3100716 0.0571612 -0.3093235 0.2041436 -0.2596717 1.3178665 1.1167513
#> x8 x9 x10
#> 1.0961864 1.1952490 0.8840685
#>
#> $cure
#> (Intercept) x1 x2 x3 x4 x5
#> 0.1648645 0.4712060 0.1959122 0.0000000 0.0000000 0.0000000
#> x6 x7 x8 x9 x10
#> 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#>
coef(fit2)
#> $surv
#> x1 x2 x3 x4 x5 x6
#> 0.31004488 0.05715587 -0.30930823 0.20416898 -0.25970589 1.31807929
#> x7 x8 x9 x10
#> 1.11680333 1.09626165 1.19532136 0.88410609
#>
#> $cure
#> (Intercept) x1 x2 x3 x4 x5
#> 0.1651741 0.5174881 0.2151424 0.0000000 0.0000000 0.0000000
#> x6 x7 x8 x9 x10
#> 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#>
### 2. regularized Cox cure model for uncertain event status
## simulate a toy data
set.seed(123)
n_obs <- 100
p <- 5
x_mat <- matrix(rnorm(n_obs * p), nrow = n_obs, ncol = p)
colnames(x_mat) <- paste0("x", seq_len(p))
surv_beta <- c(rep(0, p - 3), rep(1, 3))
cure_beta <- c(rep(1, 2), rep(0, p - 2))
dat <- simData4cure(nSubject = n_obs, lambda_censor = 0.01,
max_censor = 10, survMat = x_mat,
survCoef = surv_beta, cureCoef = cure_beta,
b0 = 0.5, p1 = 0.95, p2 = 0.95, p3 = 0.95)
## model-fitting from given design matrices
fit1 <- cox_cure_net.fit(
x_mat, x_mat,
dat$obs_time, dat$obs_event,
surv_net = list(nlambda = 5, alpha = 0.5)
)
## model-fitting from given model formula
fm <- paste(paste0("x", seq_len(p)), collapse = " + ")
surv_fm <- as.formula(sprintf("~ %s", fm))
cure_fm <- surv_fm
fit2 <- cox_cure_net(
surv_fm,
cure_fm,
data = dat,
time = obs_time,
event = obs_event,
surv_net = list(nlambda = 5, alpha = 0.9),
cure_net = list(nlambda = 5, alpha = 0.9)
)
## summary of BIC's
BIC(fit1)
#> df surv_df cure_df BIC
#> 1 2 1 1 7.509489
#> 2 4 1 3 7.416058
#> 3 6 1 5 7.401607
#> 4 7 1 6 7.391836
#> 5 7 1 6 7.376120
#> 6 7 1 6 7.372246
#> 7 7 1 6 7.371392
#> 8 7 1 6 7.371273
#> 9 7 1 6 7.371328
#> 10 7 1 6 7.371275
#> 11 5 3 2 7.090065
#> 12 6 3 3 6.941482
#> 13 8 3 5 6.935774
#> 14 9 3 6 6.926810
#> 15 9 3 6 6.910505
#> 16 9 3 6 6.906317
#> 17 9 3 6 6.905414
#> 18 9 3 6 6.905149
#> 19 9 3 6 6.905112
#> 20 9 3 6 6.905108
#> 21 7 5 2 7.077186
#> 22 8 5 3 6.925563
#> 23 10 5 5 6.920928
#> 24 11 5 6 6.912086
#> 25 11 5 6 6.895627
#> 26 11 5 6 6.891414
#> 27 11 5 6 6.890427
#> 28 11 5 6 6.890198
#> 29 11 5 6 6.890152
#> 30 11 5 6 6.890142
#> 31 7 5 2 7.070528
#> 32 8 5 3 6.918267
#> 33 10 5 5 6.913957
#> 34 11 5 6 6.905159
#> 35 11 5 6 6.888667
#> 36 11 5 6 6.884447
#> 37 11 5 6 6.883438
#> 38 11 5 6 6.883213
#> 39 11 5 6 6.883164
#> 40 11 5 6 6.883154
#> 41 7 5 2 7.070285
#> 42 8 5 3 6.917903
#> 43 10 5 5 6.913648
#> 44 11 5 6 6.904861
#> 45 11 5 6 6.888362
#> 46 11 5 6 6.884132
#> 47 11 5 6 6.883132
#> 48 11 5 6 6.882906
#> 49 11 5 6 6.882857
#> 50 11 5 6 6.882846
BIC(fit2)
#> df surv_df cure_df BIC
#> 1 2 1 1 7.481807
#> 2 7 1 6 7.406329
#> 3 7 1 6 7.344363
#> 4 7 1 6 7.341268
#> 5 7 1 6 7.349406
#> 6 4 3 1 7.002548
#> 7 9 3 6 6.910589
#> 8 9 3 6 6.851481
#> 9 9 3 6 6.848537
#> 10 9 3 6 6.848457
#> 11 7 5 2 7.073349
#> 12 11 5 6 6.948470
#> 13 11 5 6 6.888760
#> 14 11 5 6 6.885751
#> 15 11 5 6 6.885647
#> 16 7 5 2 7.070793
#> 17 11 5 6 6.945811
#> 18 11 5 6 6.886073
#> 19 11 5 6 6.883056
#> 20 11 5 6 6.882950
#> 21 7 5 2 7.070708
#> 22 11 5 6 6.945703
#> 23 11 5 6 6.885960
#> 24 11 5 6 6.882942
#> 25 11 5 6 6.882836
## list of coefficient estimates based on BIC
coef(fit1)
#> $surv
#> x1 x2 x3 x4 x5
#> -0.1003773 -0.1845998 0.9995301 1.2499012 1.2049467
#>
#> $cure
#> (Intercept) x1 x2 x3 x4 x5
#> 0.5897274 1.5713065 1.0647623 0.3845461 0.4634033 0.5414894
#>
coef(fit2)
#> $surv
#> x1 x2 x3 x4 x5
#> 0.0000000 0.0000000 0.6668049 0.8529310 0.9193001
#>
#> $cure
#> (Intercept) x1 x2 x3 x4 x5
#> 0.5785617 1.5790814 1.0498122 0.3882039 0.4665378 0.5478220
#>