pipeline.util
pipeline.util
Pipeline utility functions
Functions
Name | Description |
---|---|
check_python_version | Check if the current Python version meets the minimum required version. |
chmod_bins_exec | Ensure that all files in bin/ are executable. |
copy_config | Copy default config files to the current working directory. |
err | Prints any provided args to standard error. |
exists | Checks if file exists on the local filesystem. |
fatal | Prints any provided args to standard error |
get_genomes_dict | Get dictionary of genome annotation versions and the paths to the corresponding JSON files. |
get_genomes_list | Get list of genome annotations available for the current platform |
get_tmp_dir | Get default temporary directory for biowulf and frce. Allow user override. |
git_commit_hash | Gets the git commit hash of the RNA-seek repo. |
join_jsons | Joins multiple JSON files into one data structure. |
ln | Creates symlinks for files to an output directory. |
md5sum | Gets md5checksum of a file in memory-safe manner. |
permissions | Checks permissions using os.access() to see if the user is authorized to access |
read_config_yml | Reads a YAML configuration file and returns its contents as a dictionary. |
rename | Dynamically renames FastQ file to have one of the following extensions: .R1.fastq.gz, .R2.fastq.gz |
require | Enforces an executable is in $PATH |
safe_copy | Private function: Given a list paths it will recursively copy each to the |
standard_input | Checks for standard input when provided or permissions using permissions(). |
which | Checks if an executable is in $PATH |
write_config_yml | Writes a configuration dictionary to a YAML file. |
check_python_version
=(3, 11)) pipeline.util.check_python_version(MIN_PYTHON
Check if the current Python version meets the minimum required version.
Parameters
Name | Type | Description | Default |
---|---|---|---|
MIN_PYTHON | tuple | Minimum required Python version as a tuple (major, minor). | (3, 11) |
chmod_bins_exec
=repo_base) pipeline.util.chmod_bins_exec(repo_base
Ensure that all files in bin/ are executable.
It appears that setuptools strips executable permissions from package_data files, yet post-install scripts are not possible with the pyproject.toml format. Without this hack, nextflow processes that call scripts in bin/ fail.
See Also
https://stackoverflow.com/questions/18409296/package-data-files-with-executable-permissions https://github.com/pypa/setuptools/issues/2041 https://stackoverflow.com/questions/76320274/post-install-script-for-pyproject-toml-projects
copy_config
=True, repo_base=repo_base) pipeline.util.copy_config(config_paths, overwrite
Copy default config files to the current working directory.
Parameters
Name | Type | Description | Default |
---|---|---|---|
config_paths | list[str] | List of configuration paths to copy. | required |
overwrite | bool | Whether to overwrite existing files. Defaults to True. | True |
repo_base | function |
Function to get the base directory of the repository. | repo_base |
err
*message, **kwargs) pipeline.util.err(
Prints any provided args to standard error. kwargs can be provided to modify print function’s behavior.
Parameters
Name | Type | Description | Default |
---|---|---|---|
message | any | Values printed to standard error. | () |
kwargs | dict | Key words to modify print function behavior. | {} |
exists
pipeline.util.exists(testpath)
Checks if file exists on the local filesystem.
Parameters
Name | Type | Description | Default |
---|---|---|---|
parser | argparse.ArgumentParser | Argparse parser object. | required |
testpath | str | Name of file/directory to check. | required |
Returns
Name | Type | Description |
---|---|---|
bool | True when file/directory exists, False when file/directory does not exist. |
fatal
*message, **kwargs) pipeline.util.fatal(
Prints any provided args to standard error and exits with an exit code of 1.
Parameters
Name | Type | Description | Default |
---|---|---|---|
message | any | Values printed to standard error. | () |
kwargs | dict | Key words to modify print function behavior. | {} |
get_genomes_dict
pipeline.util.get_genomes_dict(
repo_base,=get_hpcname(),
hpcname=False,
error_on_warnings )
Get dictionary of genome annotation versions and the paths to the corresponding JSON files.
Parameters
Name | Type | Description | Default |
---|---|---|---|
repo_base | function |
Function for getting the base directory of the repository. | required |
hpcname | str | Name of the HPC. Defaults to the value returned by get_hpcname(). | get_hpcname() |
error_on_warnings | bool | Flag to indicate whether to raise warnings as errors. Defaults to False. | False |
Returns: genomes_dict (dict): A dictionary containing genome names as keys and corresponding JSON file paths as values. { genome_name: json_file_path }
get_genomes_list
pipeline.util.get_genomes_list(
repo_base,=get_hpcname(),
hpcname=False,
error_on_warnings )
Get list of genome annotations available for the current platform
Parameters
Name | Type | Description | Default |
---|---|---|---|
repo_base | str | The base directory of the repository | required |
hpcname | str | The name of the HPC. Defaults to the value returned by get_hpcname(). | get_hpcname() |
error_on_warnings | bool | Whether to raise an error on warnings. Defaults to False. | False |
Returns: genomes (list): A sorted list of genome annotations available for the current platform
get_tmp_dir
=get_hpcname()) pipeline.util.get_tmp_dir(tmp_dir, outdir, hpc
Get default temporary directory for biowulf and frce. Allow user override.
Parameters
Name | Type | Description | Default |
---|---|---|---|
tmp_dir | str | User-defined temporary directory path. If provided, this path will be used as the temporary directory. | required |
outdir | str | Output directory path. | required |
hpc | str | HPC name. Defaults to the value returned by get_hpcname(). | get_hpcname() |
Returns: tmp_dir (str): The default temporary directory path based on the HPC name and user-defined path.
git_commit_hash
pipeline.util.git_commit_hash(repo_path)
Gets the git commit hash of the RNA-seek repo.
Parameters
Name | Type | Description | Default |
---|---|---|---|
repo_path | str | Path to RNA-seek git repo. | required |
Returns
Name | Type | Description |
---|---|---|
str | Latest git commit hash. |
join_jsons
pipeline.util.join_jsons(templates)
Joins multiple JSON files into one data structure. Used to join multiple template JSON files to create a global config dictionary.
Parameters
Name | Type | Description | Default |
---|---|---|---|
templates | list[str] | List of template JSON files to join together. | required |
Returns
Name | Type | Description |
---|---|---|
dict | Dictionary containing the contents of all the input JSON files. |
ln
pipeline.util.ln(files, outdir)
Creates symlinks for files to an output directory.
Parameters
Name | Type | Description | Default |
---|---|---|---|
files | list[str] | List of filenames. | required |
outdir | str | Destination or output directory to create symlinks. | required |
md5sum
=False, blocksize=65536) pipeline.util.md5sum(filename, first_block_only
Gets md5checksum of a file in memory-safe manner. The file is read in blocks/chunks defined by the blocksize parameter. This is a safer option to reading the entire file into memory if the file is very large.
Parameters
Name | Type | Description | Default |
---|---|---|---|
filename | str | Input file on local filesystem to find md5 checksum. | required |
first_block_only | bool | Calculate md5 checksum of the first block/chunk only. | False |
blocksize | int | Blocksize of reading N chunks of data to reduce memory profile. | 65536 |
Returns
Name | Type | Description |
---|---|---|
str | MD5 checksum of the file’s contents. |
permissions
*args, **kwargs) pipeline.util.permissions(parser, path,
Checks permissions using os.access() to see if the user is authorized to access a file/directory. Checks for existence, readability, writability, and executability via: os.F_OK (tests existence), os.R_OK (tests read), os.W_OK (tests write), os.X_OK (tests exec).
Parameters
Name | Type | Description | Default |
---|---|---|---|
parser | argparse.ArgumentParser | Argparse parser object. | required |
path | str | Name of the path to check. | required |
Returns
Name | Type | Description |
---|---|---|
str | Returns absolute path if it exists and permissions are correct. |
read_config_yml
file) pipeline.util.read_config_yml(
Reads a YAML configuration file and returns its contents as a dictionary.
Parameters
Name | Type | Description | Default |
---|---|---|---|
file | str | The path to the YAML file to be read. | required |
Returns
Name | Type | Description |
---|---|---|
dict | The contents of the YAML file as a dictionary. |
rename
pipeline.util.rename(filename)
Dynamically renames FastQ file to have one of the following extensions: .R1.fastq.gz, .R2.fastq.gz To automatically rename the fastq files, a few assumptions are made. If the extension of the FastQ file cannot be inferred, an exception is raised telling the user to fix the filename of the fastq files.
Parameters
Name | Type | Description | Default |
---|---|---|---|
filename | str | Original name of file to be renamed. | required |
Returns
Name | Type | Description |
---|---|---|
str | A renamed FastQ filename. |
require
=None) pipeline.util.require(cmds, suggestions, path
Enforces an executable is in $PATH
Parameters
Name | Type | Description | Default |
---|---|---|---|
cmds | list[str] | List of executable names to check. | required |
suggestions | list[str] | Name of module to suggest loading for a given index in cmds. | required |
path | list[str] | Optional list of PATHs to check. Defaults to $PATH. | None |
safe_copy
=[]) pipeline.util.safe_copy(source, target, resources
Private function: Given a list paths it will recursively copy each to the target location. If a target path already exists, it will NOT over-write the existing paths data.
Parameters
Name | Type | Description | Default |
---|---|---|---|
resources | list[str] | List of paths to copy over to target location. | [] |
source | str | Add a prefix PATH to each resource. | required |
target | str | Target path to copy templates and required resources. | required |
standard_input
*args, **kwargs) pipeline.util.standard_input(parser, path,
Checks for standard input when provided or permissions using permissions().
Parameters
Name | Type | Description | Default |
---|---|---|---|
parser | argparse.ArgumentParser | Argparse parser object. | required |
path | str | Name of the path to check. | required |
Returns
Name | Type | Description |
---|---|---|
str | If path exists and user can read from location. |
which
=None) pipeline.util.which(cmd, path
Checks if an executable is in $PATH
Parameters
Name | Type | Description | Default |
---|---|---|---|
cmd | str | Name of the executable to check. | required |
path | list | Optional list of PATHs to check. Defaults to $PATH. | None |
Returns: bool: True if the executable is in PATH, False otherwise.
write_config_yml
file) pipeline.util.write_config_yml(_config,
Writes a configuration dictionary to a YAML file.
Parameters
Name | Type | Description | Default |
---|---|---|---|
_config | dict | The configuration dictionary to write to the file. | required |
file | str | The path to the file where the configuration will be written. | required |