paths

paths

Functions

Name Description
create_tar_archive Creates a compressed tar archive (.tar.gz) containing the specified files.
get_tree Generate a directory tree structure using the tree command-line utility
glob_files Collects files from a specified directory and its subdirectories that match a list of patterns.
load_tree Load a tree structure from a string, attempting to parse it as JSON or
run_du Calculates the total size of a directory in bytes using the du shell command.

create_tar_archive

paths.create_tar_archive(files, tar_filename)

Creates a compressed tar archive (.tar.gz) containing the specified files.

Parameters

Name Type Description Default
files list of pathlib.Path A list of file paths to include in the archive. required
tar_filename str The name of the output tar.gz file. required

get_tree

paths.get_tree(pipeline_outdir, args='-aJ --du')

Generate a directory tree structure using the tree command-line utility

Note: when using -J with –du, the output is not valid JSON due to extra trailing commas. It can be parsed with ast.literal_eval rather than json.loads.

Parameters

Name Type Description Default
pipeline_outdir str The path to the directory for which the tree structure will be generated. required
args str Additional arguments to pass to the tree command. Defaults to “-aJ” for including hidden files and formatting output as JSON '-aJ --du'

Returns

Name Type Description
str The directory tree structure as a string, stripped of any
leading or trailing whitespace.

glob_files

paths.glob_files(
    pipeline_outdir,
    patterns=['snakemake.log', '.nextflow.log', '*.jobby*', 'master.log', 'runtime_statics*'],
)

Collects files from a specified directory and its subdirectories that match a list of patterns.

Parameters

Name Type Description Default
pipeline_outdir str The base directory to search for files. required
patterns list of str A list of glob patterns to match files. Defaults to: [ “snakemake.log”, “.nextflow.log”, “.jobby”, “master.log”, “runtime_statics*“, ]. ['snakemake.log', '.nextflow.log', '*.jobby*', 'master.log', 'runtime_statics*']

Returns

Name Type Description
set of pathlib.Path: A set of pathlib.Path objects representing the matched files.

load_tree

paths.load_tree(tree_str)

Load a tree structure from a string, attempting to parse it as JSON or Python literal.

Parameters

Name Type Description Default
tree_str str The string representation of the tree structure. required

Returns: dict: The parsed tree structure as a dictionary.

run_du

paths.run_du(dirpath)

Calculates the total size of a directory in bytes using the du shell command.

Parameters

Name Type Description Default
dirpath str Path to the directory whose size is to be calculated. required

Returns: int or float: The size of the directory in bytes. Returns NaN if the size cannot be determined. Raises: Issues a warning if the directory size cannot be parsed or if the du command fails.