Result Compression¶
Functions for compressing result directories.
The functions in this module are used to compress results files for datasets
that are completely finished. This can be useful for when raw results need to
be stored on disk, but storage space is sparse. The compression algorithm used
can be set by the user in the configuration file (COMPRESSION
setting). All compressed files are tar files, generated with the highest
possible compression setting.
-
abed.compress.
compress_dataset
(dset)[source]¶ Compress results of a given dataset
This function compresses the results for a given dataset into a compressed tar file. This is done through the tarfile module for the gzip and bzip2 compression algorithms. When lzma is used, the tarfile module can only be used when ABED is run through Python 3 (as lzma compression is not available in the Python 2 tarfile package). When running on Python 2 on a posix platform however, we assume that lzma compression is available in the
tar
command. Therefore, in this case the result directory is compressed using a call totar
.In all cases, the highest compression level is used, to save as much disk space as possible.
- Parameters
dset (str/tuple) – The name of the dataset. Depending on the type of experiments that were done this is either a string or a tuple of strings
Notes
ABED doesn’t remove the existing folder, so this should be done by the user.
- Raises
SystemExit – When an unknown compression algorithm is specified, when lzma compression is requested on an unsupported platform, or when an error occurs with the external
tar
command, the program exits.
-
abed.compress.
compress_results
(task_dict)[source]¶ Compress results for all datasets which are complete
This function iterates over all datasets defined in the settings file, and collects a list of files in the result directory that correspond to each dataset. Next, for each dataset that is complete, the function
compress_dataset()
is called, which does the actual compressing.- Parameters
task_dict (dict) – The dict with the mappings from hashes to command dicts, as returned by
tasks.init_tasks()
.
-
abed.compress.
dataset_completed
(dsetfiles, dset, task_dict)[source]¶ Check if a given dataset is complete
This function checks if all results for a given dataset are available on disk. This is done by checking if all hashes for the specified dataset are available.
- Parameters
dsetfiles (list) – Filenames of files relating to this dataset
dset (str/tuple) – The name of the dataset. Depending on the type of experiments that were done this is either a string or a tuple of strings
task_dict (dict) – The dict with the mappings from hashes to command dicts, as returned by
tasks.init_tasks()
.
- Returns
Whether or not all results for the given dataset are available.
- Return type
bool
- Raises
SystemExit – When trying to compress results for an experiment type which does not support result compression (for instance when using RAW mode), an error is printed and the program exits.