GenSVM
Grid Input File Specification

This page specifies the training file that can be parsed by read_training_from_file(). Below is an example training file.

train: /path/to/training/dataset.txt
test: /path/to/test/dataset.txt
p: 1.0 1.5 2.0
kappa: -0.9 0.0 1.0
lambda: 64 16 4 1 0.25 0.0625 0.015625 0.00390625 0.0009765625 0.000244140625
epsilon: 1e-6
weight: 1 2
folds: 10
kernel: LINEAR
gamma: 1e-3 1e-1 1e1 1e3
coef: 1.0 2.0
degree: 2.0 3.0

Note that with a LINEAR kernel specification, the gamma, coef, and degree parameters do not need to be specified. The above merely shows all available parameters that can be specified in the grid search. Below each of the parameters are described in more detail. Arguments followed by an asterisk are optional.

train:
The location of the training dataset file. See Default Data File Specification for the specification of a dataset file.

test:*
The location of a test dataset file. See Default Data File Specification for the specification of a dataset file. This is optional, if specified the train/test split will be used for training.

p:
The values of the p parameter of the algorithm to search over. The p parameter is used in the $ \ell_p $ norm over the Huber weighted scalar misclassification errors. Note: $ 1 \leq p \leq 2 $.

kappa:
The values of the kappa parameter of the algorithm to search over. The kappa parameter is used in the Huber hinge error over the scalar misclassification errors. Note: $ \kappa > -1 $.

lambda:
The values of the lambda parameter of the algorithm to search over. The lambda parameter is used in the regularization term of the loss function. Note: $ \lambda > 0 $.

epsilon:
The values of the epsilon parameter of the algorithm to search over. The epsilon parameter is used as the stopping parameter in the majorization algorithm. Note that it often suffices to use only one epsilon value. Using more than one value increases the size of the grid search considerably.

weight:
The weight specifications for the algorithm to use. Two weight specifications are implemented: the unit weights (index = 1) and the group size correction weights (index = 2). See also gensvm_initialize_weights().

folds:
The number of cross validation folds to use.

kernel:*
Kernel to use in training. Only one kernel can be specified. See KernelType for available kernel functions. Note: if multiple kernel types are specified on this line, only the last value will be used (see the implementation of parse_kernel_str() for details). If no kernel is specified, the LINEAR kernel will be used.

gamma:*
Gamma parameters for the RBF, POLY, and SIGMOID kernels. This parameter is only optional if the LINEAR kernel is specified. See gensvm_kernel_dot_rbf(), gensvm_kernel_dot_poly(), and gensvm_kernel_dot_sigmoid() for kernel specifications.

coef:*
Coefficients for the POLY and SIGMOID kernels. This parameter is only optional if the LINEAR or RBF kernels are used. See gensvm_kernel_dot_poly(), and gensvm_kernel_dot_sigmoid() for kernel specifications.

degree:*
Degrees to search over in the grid search when the POLY kernel is specified. With other kernel specifications this parameter is unnecessary. See gensvm_kernel_dot_poly() for the polynomial kernel specification.