GenSVM
LibSVM/SVMlight Data File Specification

Here we briefly describe the input file format for a dataset stored in LibSVM/SVMlight format. This is based on the LibSVM documentation. Files in this format can be read by the function gensvm_read_data_libsvm(), and can be used in the executables with the -x flag.

The LibSVM/SVMlight file format is a sparse format and so only the nonzero values are expected to be stored. Each value is therefore accompanied by its index. In GenSVM, this index can be either 0-based or 1-based. The basic file format is as follows:

y_1 index1:value1 index2:value2 ...
.
.
.

For a training dataset, the class labels y_i are expected in the first column of each line. Class labels can be left out of the file for a test dataset (in which case the file only contains index/value pairs).

As an example, below the first 5 lines of the iris dataset are shown.

1 1:5.10000 2:3.50000 3:1.40000 4:0.20000
1 1:4.90000 2:3.00000 3:1.40000 4:0.20000
1 1:4.70000 2:3.20000 3:1.30000 4:0.20000