Multi-Category Angle-Based Classification

Multi-category angle-based large-margin classifiers with regularization by the elastic-net or groupwise penalty.

Usage

abclass(
  x,
  y,
  intercept = TRUE,
  weight = NULL,
  loss = c("logistic", "boost", "hinge-boost", "lum"),
  control = list(),
  ...
)

abclass.control(
  lambda = NULL,
  alpha = 1,
  nlambda = 50L,
  lambda_min_ratio = NULL,
  grouped = TRUE,
  group_weight = NULL,
  group_penalty = c("lasso", "scad", "mcp"),
  dgamma = 1,
  lum_a = 1,
  lum_c = 1,
  boost_umin = -5,
  maxit = 100000L,
  epsilon = 1e-04,
  standardize = TRUE,
  varying_active_set = TRUE,
  verbose = 0L,
  ...
)

Arguments

x: A numeric matrix representing the design matrix. No missing valus are allowed. The coefficient estimates for constant columns will be zero. Thus, one should set the argument intercept to TRUE to include an intercept term instead of adding an all-one column to x.
y: An integer vector, a character vector, or a factor vector representing the response label.
intercept: A logical value indicating if an intercept should be considered in the model. The default value is TRUE and the intercept is excluded from regularization.
weight: A numeric vector for nonnegative observation weights. Equal observation weights are used by default.
loss: A character value specifying the loss function. The available options are "logistic" for the logistic deviance loss, "boost" for the exponential loss approximating Boosting machines, "hinge-boost" for hybrid of SVM and AdaBoost machine, and "lum" for largin-margin unified machines (LUM). See Liu, et al. (2011) for details.
control: A list of control parameters. See abclass.control() for details.
...: Other control parameters passed to abclass.control().
lambda: A numeric vector specifying the tuning parameter lambda. A data-driven lambda sequence will be generated and used according to specified alpha, nlambda and lambda_min_ratio if this argument is left as NULL by default. The specified lambda will be sorted in decreasing order internally and only the unique values will be kept.
alpha: A numeric value in [0, 1] representing the mixing parameter alpha. The default value is 1.0.
nlambda: A positive integer specifying the length of the internally generated lambda sequence. This argument will be ignored if a valid lambda is specified. The default value is 50.
lambda_min_ratio: A positive number specifying the ratio of the smallest lambda parameter to the largest lambda parameter. The default value is set to 1e-4 if the sample size is larger than the number of predictors, and 1e-2 otherwise.
grouped: A logicial value. Experimental flag to apply group penalties.
group_weight: A numerical vector with nonnegative values representing the adaptive penalty factors for the specified group penalty.
group_penalty: A character vector specifying the name of the group penalty.
dgamma: A positive number specifying the increment to the minimal gamma parameter for group SCAD or group MCP.
lum_a: A positive number greater than one representing the parameter a in LUM, which will be used only if loss = "lum". The default value is 1.0.
lum_c: A nonnegative number specifying the parameter c in LUM, which will be used only if loss = "hinge-boost" or loss = "lum". The default value is 1.0.
boost_umin: A negative number for adjusting the boosting loss for the internal majorization procedure.
maxit: A positive integer specifying the maximum number of iteration. The default value is 10^5.
epsilon: A positive number specifying the relative tolerance that determines convergence. The default value is 1e-4.
standardize: A logical value indicating if each column of the design matrix should be standardized internally to have mean zero and standard deviation equal to the sample size. The default value is TRUE. Notice that the coefficient estimates are always returned on the original scale.
varying_active_set: A logical value indicating if the active set should be updated after each cycle of coordinate-majorization-descent algorithm. The default value is TRUE for usually more efficient estimation procedure.
verbose: A nonnegative integer specifying if the estimation procedure is allowed to print out intermediate steps/results. The default value is 0 for silent estimation procedure.

Value

The function abclass() returns an object of class

abclass representing a trained classifier; The function

abclass.control() returns an object of class abclass.control

representing a list of control parameters.

References

Zhang, C., & Liu, Y. (2014). Multicategory Angle-Based Large-Margin Classification. Biometrika, 101(3), 625--640.

Liu, Y., Zhang, H. H., & Wu, Y. (2011). Hard or soft classification? large-margin unified machines. Journal of the American Statistical Association, 106(493), 166--177.

Examples

library(abclass)
set.seed(123)

## toy examples for demonstration purpose
## reference: example 1 in Zhang and Liu (2014)
ntrain <- 100 # size of training set
ntest <- 100  # size of testing set
p0 <- 5       # number of actual predictors
p1 <- 5       # number of random predictors
k <- 5        # number of categories

n <- ntrain + ntest; p <- p0 + p1
train_idx <- seq_len(ntrain)
y <- sample(k, size = n, replace = TRUE)         # response
mu <- matrix(rnorm(p0 * k), nrow = k, ncol = p0) # mean vector
## normalize the mean vector so that they are distributed on the unit circle
mu <- mu / apply(mu, 1, function(a) sqrt(sum(a ^ 2)))
x0 <- t(sapply(y, function(i) rnorm(p0, mean = mu[i, ], sd = 0.25)))
x1 <- matrix(rnorm(p1 * n, sd = 0.3), nrow = n, ncol = p1)
x <- cbind(x0, x1)
train_x <- x[train_idx, ]
test_x <- x[- train_idx, ]
y <- factor(paste0("label_", y))
train_y <- y[train_idx]
test_y <- y[- train_idx]

## Regularization through ridge penalty
control1 <- abclass.control(nlambda = 5, lambda_min_ratio = 1e-3,
                            alpha = 1, grouped = FALSE)
model1 <- abclass(train_x, train_y, loss = "logistic",
                  control = control1)
pred1 <- predict(model1, test_x, s = 5)
table(test_y, pred1)
#>          pred1
#> test_y    label_1 label_2 label_3 label_4 label_5
#>   label_1      22       0       3       0       0
#>   label_2       0      15       0       4       1
#>   label_3       0       0      12       0       2
#>   label_4       1       0       0      16       0
#>   label_5       0       1       1       0      22
mean(test_y == pred1) # accuracy
#> [1] 0.87

## groupwise regularization via group lasso
model2 <- abclass(train_x, train_y, loss = "boost",
                  grouped = TRUE, nlambda = 5)
pred2 <- predict(model2, test_x, s = 5)
table(test_y, pred2)
#>          pred2
#> test_y    label_1 label_2 label_3 label_4 label_5
#>   label_1      24       0       1       0       0
#>   label_2       0      19       0       1       0
#>   label_3       0       0      13       0       1
#>   label_4       1       1       0      15       0
#>   label_5       0       1       1       0      22
mean(test_y == pred2) # accuracy
#> [1] 0.93