Result Compression

Functions for compressing result directories.

The functions in this module are used to compress results files for datasets that are completely finished. This can be useful for when raw results need to be stored on disk, but storage space is sparse. The compression algorithm used can be set by the user in the configuration file (COMPRESSION setting). All compressed files are tar files, generated with the highest possible compression setting.

abed.compress.compress_dataset(dset)[source]

Compress results of a given dataset

This function compresses the results for a given dataset into a compressed tar file. This is done through the tarfile module for the gzip and bzip2 compression algorithms. When lzma is used, the tarfile module can only be used when ABED is run through Python 3 (as lzma compression is not available in the Python 2 tarfile package). When running on Python 2 on a posix platform however, we assume that lzma compression is available in the tar command. Therefore, in this case the result directory is compressed using a call to tar.

In all cases, the highest compression level is used, to save as much disk space as possible.

Parameters

dset (str/tuple) – The name of the dataset. Depending on the type of experiments that were done this is either a string or a tuple of strings

Notes

ABED doesn’t remove the existing folder, so this should be done by the user.

Raises

SystemExit – When an unknown compression algorithm is specified, when lzma compression is requested on an unsupported platform, or when an error occurs with the external tar command, the program exits.

abed.compress.compress_results(task_dict)[source]

Compress results for all datasets which are complete

This function iterates over all datasets defined in the settings file, and collects a list of files in the result directory that correspond to each dataset. Next, for each dataset that is complete, the function compress_dataset() is called, which does the actual compressing.

Parameters

task_dict (dict) – The dict with the mappings from hashes to command dicts, as returned by tasks.init_tasks().

abed.compress.dataset_completed(dsetfiles, dset, task_dict)[source]

Check if a given dataset is complete

This function checks if all results for a given dataset are available on disk. This is done by checking if all hashes for the specified dataset are available.

Parameters
  • dsetfiles (list) – Filenames of files relating to this dataset

  • dset (str/tuple) – The name of the dataset. Depending on the type of experiments that were done this is either a string or a tuple of strings

  • task_dict (dict) – The dict with the mappings from hashes to command dicts, as returned by tasks.init_tasks().

Returns

Whether or not all results for the given dataset are available.

Return type

bool

Raises

SystemExit – When trying to compress results for an experiment type which does not support result compression (for instance when using RAW mode), an error is printed and the program exits.