pipeline.util
pipeline.util
Pipeline utility functions
Functions
Name | Description |
---|---|
chmod_bins_exec | Ensure that all files in bin/ are executable. |
err | Prints any provided args to standard error. |
exists | Checks if file exists on the local filesystem. |
fatal | Prints any provided args to standard error |
get_genomes_dict | Get dictionary of genome annotation versions and the paths to the corresponding JSON files. |
get_genomes_list | Get list of genome annotations available for the current platform |
get_hpcname | Get the HPC name using scontrol |
get_tmp_dir | Get default temporary directory for biowulf and frce. Allow user override. |
git_commit_hash | Gets the git commit hash of the RNA-seek repo. |
join_jsons | Joins multiple JSON files to into one data structure |
ln | Creates symlinks for files to an output directory. |
md5sum | Gets md5checksum of a file in memory-safe manner. |
permissions | Checks permissions using os.access() to see the user is authorized to access |
rename | Dynamically renames FastQ file to have one of the following extensions: .R1.fastq.gz, .R2.fastq.gz |
require | Enforces an executable is in $PATH |
safe_copy | Private function: Given a list paths it will recursively copy each to the |
scontrol_show | Run scontrol show config and parse the output as a dictionary |
standard_input | Checks for standard input when provided or permissions using permissions(). |
which | Checks if an executable is in $PATH |
chmod_bins_exec
=repo_base) pipeline.util.chmod_bins_exec(repo_base
Ensure that all files in bin/ are executable.
It appears that setuptools strips executable permissions from package_data files, yet post-install scripts are not possible with the pyproject.toml format. Without this hack, nextflow processes that call scripts in bin/ fail.
https://stackoverflow.com/questions/18409296/package-data-files-with-executable-permissions https://github.com/pypa/setuptools/issues/2041 https://stackoverflow.com/questions/76320274/post-install-script-for-pyproject-toml-projects
err
*message, **kwargs) pipeline.util.err(
Prints any provided args to standard error. kwargs can be provided to modify print functions behavior. @param message
exists
pipeline.util.exists(testpath)
Checks if file exists on the local filesystem. @param parser <argparse.ArgumentParser() object>: argparse parser object @param testpath
fatal
*message, **kwargs) pipeline.util.fatal(
Prints any provided args to standard error and exits with an exit code of 1. @param message
get_genomes_dict
pipeline.util.get_genomes_dict(
repo_base,=get_hpcname(),
hpcname=False,
error_on_warnings )
Get dictionary of genome annotation versions and the paths to the corresponding JSON files.
Parameters
Name | Type | Description | Default |
---|---|---|---|
repo_base | function |
Function for getting the base directory of the repository. | required |
hpcname | str |
Name of the HPC. Defaults to the value returned by get_hpcname(). | get_hpcname() |
error_on_warnings | bool |
Flag to indicate whether to raise warnings as errors. Defaults to False. | False |
Returns: genomes_dict (dict): A dictionary containing genome names as keys and corresponding JSON file paths as values. { genome_name: json_file_path }
get_genomes_list
pipeline.util.get_genomes_list(
repo_base,=get_hpcname(),
hpcname=False,
error_on_warnings )
Get list of genome annotations available for the current platform
Parameters
Name | Type | Description | Default |
---|---|---|---|
repo_base | str |
The base directory of the repository | required |
hpcname | str |
The name of the HPC. Defaults to the value returned by get_hpcname(). | get_hpcname() |
error_on_warnings | bool |
Whether to raise an error on warnings. Defaults to False. | False |
Returns: genomes (list): A sorted list of genome annotations available for the current platform
get_hpcname
pipeline.util.get_hpcname()
Get the HPC name using scontrol
Returns
Name | Type | Description |
---|---|---|
hpcname | str |
The HPC name (biowulf, frce, or an empty string) |
get_tmp_dir
=get_hpcname()) pipeline.util.get_tmp_dir(tmp_dir, outdir, hpc
Get default temporary directory for biowulf and frce. Allow user override.
Parameters
Name | Type | Description | Default |
---|---|---|---|
tmp_dir | str |
User-defined temporary directory path. If provided, this path will be used as the temporary directory. | required |
outdir | str |
Output directory path. | required |
hpc | str |
HPC name. Defaults to the value returned by get_hpcname(). | get_hpcname() |
Returns: tmp_dir (str): The default temporary directory path based on the HPC name and user-defined path.
git_commit_hash
pipeline.util.git_commit_hash(repo_path)
Gets the git commit hash of the RNA-seek repo. @param repo_path
join_jsons
pipeline.util.join_jsons(templates)
Joins multiple JSON files to into one data structure Used to join multiple template JSON files to create a global config dictionary. @params templates <list[str]>: List of template JSON files to join together @return aggregated
ln
pipeline.util.ln(files, outdir)
Creates symlinks for files to an output directory. @param files list[
md5sum
=False, blocksize=65536) pipeline.util.md5sum(filename, first_block_only
Gets md5checksum of a file in memory-safe manner. The file is read in blocks/chunks defined by the blocksize parameter. This is a safer option to reading the entire file into memory if the file is very large. @param filename
permissions
*args, **kwargs) pipeline.util.permissions(parser, path,
Checks permissions using os.access() to see the user is authorized to access a file/directory. Checks for existence, readability, writability and executability via: os.F_OK (tests existence), os.R_OK (tests read), os.W_OK (tests write), os.X_OK (tests exec). @param parser <argparse.ArgumentParser() object>: Argparse parser object @param path
rename
pipeline.util.rename(filename)
Dynamically renames FastQ file to have one of the following extensions: .R1.fastq.gz, .R2.fastq.gz To automatically rename the fastq files, a few assumptions are made. If the extension of the FastQ file cannot be inferred, an exception is raised telling the user to fix the filename of the fastq files. @param filename
require
=None) pipeline.util.require(cmds, suggestions, path
Enforces an executable is in $PATH @param cmds list[
safe_copy
=[]) pipeline.util.safe_copy(source, target, resources
Private function: Given a list paths it will recursively copy each to the target location. If a target path already exists, it will NOT over-write the existing paths data. @param resources <list[str]>: List of paths to copy over to target location @params source
scontrol_show
pipeline.util.scontrol_show()
Run scontrol show config
and parse the output as a dictionary
Returns
Name | Type | Description |
---|---|---|
scontrol_dict | dict |
dictionary containing the output of scontrol show config |
standard_input
*args, **kwargs) pipeline.util.standard_input(parser, path,
Checks for standard input when provided or permissions using permissions(). @param parser <argparse.ArgumentParser() object>: Argparse parser object @param path
which
=None) pipeline.util.which(cmd, path
Checks if an executable is in $PATH @param cmd : Optional list of PATHs to check [default: $PATH] @return