pipeline.util
pipeline.util
Pipeline utility functions
Functions
| Name | Description |
|---|---|
| check_python_version | Check if the current Python version meets the minimum required version. |
| chmod_bins_exec | Ensure that all files in bin/ are executable. |
| copy_config | Copies default configuration files to the specified output directory. |
| err | Prints any provided args to standard error. |
| exists | Checks if file exists on the local filesystem. |
| fatal | Prints any provided args to standard error |
| get_genomes_dict | Get dictionary of genome annotation versions and the paths to the corresponding JSON files. |
| get_genomes_list | Get list of genome annotations available for the current platform |
| get_tmp_dir | Get default temporary directory for biowulf and frce. Allow user override. |
| git_commit_hash | Gets the git commit hash of the RNA-seek repo. |
| join_jsons | Joins multiple JSON files into one data structure. |
| ln | Creates symlinks for files to an output directory. |
| md5sum | Gets md5checksum of a file in memory-safe manner. |
| permissions | Checks permissions using os.access() to see if the user is authorized to access |
| read_config_yml | Reads a YAML configuration file and returns its contents as a dictionary. |
| rename | Dynamically renames FastQ file to have one of the following extensions: .R1.fastq.gz, .R2.fastq.gz |
| require | Enforces an executable is in $PATH |
| safe_copy | Private function: Given a list paths it will recursively copy each to the |
| standard_input | Checks for standard input when provided or permissions using permissions(). |
| which | Checks if an executable is in $PATH |
| write_config_yml | Writes a configuration dictionary to a YAML file. |
check_python_version
pipeline.util.check_python_version(MIN_PYTHON=(3, 11))Check if the current Python version meets the minimum required version.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| MIN_PYTHON | tuple | Minimum required Python version as a tuple (major, minor). | (3, 11) |
chmod_bins_exec
pipeline.util.chmod_bins_exec(repo_base=repo_base)Ensure that all files in bin/ are executable.
It appears that setuptools strips executable permissions from package_data files, yet post-install scripts are not possible with the pyproject.toml format. Without this hack, nextflow processes that call scripts in bin/ fail.
See Also
https://stackoverflow.com/questions/18409296/package-data-files-with-executable-permissions https://github.com/pypa/setuptools/issues/2041 https://stackoverflow.com/questions/76320274/post-install-script-for-pyproject-toml-projects
copy_config
pipeline.util.copy_config(
config_paths,
outdir,
overwrite=True,
repo_base=repo_base,
)Copies default configuration files to the specified output directory.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| config_paths | list | A list of paths to the local configuration files. | required |
| outdir | pathlib.Path | The output directory where the configuration files will be copied. | required |
| overwrite | bool | Whether to overwrite existing files and directories. Defaults to True. | True |
Raises
| Name | Type | Description |
|---|---|---|
| FileNotFoundError | If a specified configuration file or directory does not exist. |
err
pipeline.util.err(*message, **kwargs)Prints any provided args to standard error. kwargs can be provided to modify print function’s behavior.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| message | any | Values printed to standard error. | () |
| kwargs | dict | Key words to modify print function behavior. | {} |
exists
pipeline.util.exists(testpath)Checks if file exists on the local filesystem.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| parser | argparse.ArgumentParser | Argparse parser object. | required |
| testpath | str | Name of file/directory to check. | required |
Returns
| Name | Type | Description |
|---|---|---|
| bool | True when file/directory exists, False when file/directory does not exist. |
fatal
pipeline.util.fatal(*message, **kwargs)Prints any provided args to standard error and exits with an exit code of 1.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| message | any | Values printed to standard error. | () |
| kwargs | dict | Key words to modify print function behavior. | {} |
get_genomes_dict
pipeline.util.get_genomes_dict(
repo_base,
hpcname=get_hpcname(),
error_on_warnings=False,
)Get dictionary of genome annotation versions and the paths to the corresponding JSON files.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| repo_base | function |
Function for getting the base directory of the repository. | required |
| hpcname | str | Name of the HPC. Defaults to the value returned by get_hpcname(). | get_hpcname() |
| error_on_warnings | bool | Flag to indicate whether to raise warnings as errors. Defaults to False. | False |
Returns: genomes_dict (dict): A dictionary containing genome names as keys and corresponding JSON file paths as values. { genome_name: json_file_path }
get_genomes_list
pipeline.util.get_genomes_list(
repo_base,
hpcname=get_hpcname(),
error_on_warnings=False,
)Get list of genome annotations available for the current platform
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| repo_base | str | The base directory of the repository | required |
| hpcname | str | The name of the HPC. Defaults to the value returned by get_hpcname(). | get_hpcname() |
| error_on_warnings | bool | Whether to raise an error on warnings. Defaults to False. | False |
Returns: genomes (list): A sorted list of genome annotations available for the current platform
get_tmp_dir
pipeline.util.get_tmp_dir(tmp_dir, outdir, hpc=get_hpcname())Get default temporary directory for biowulf and frce. Allow user override.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| tmp_dir | str | User-defined temporary directory path. If provided, this path will be used as the temporary directory. | required |
| outdir | str | Output directory path. | required |
| hpc | str | HPC name. Defaults to the value returned by get_hpcname(). | get_hpcname() |
Returns: tmp_dir (str): The default temporary directory path based on the HPC name and user-defined path.
git_commit_hash
pipeline.util.git_commit_hash(repo_path)Gets the git commit hash of the RNA-seek repo.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| repo_path | str | Path to RNA-seek git repo. | required |
Returns
| Name | Type | Description |
|---|---|---|
| str | Latest git commit hash. |
join_jsons
pipeline.util.join_jsons(templates)Joins multiple JSON files into one data structure. Used to join multiple template JSON files to create a global config dictionary.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| templates | list[str] | List of template JSON files to join together. | required |
Returns
| Name | Type | Description |
|---|---|---|
| dict | Dictionary containing the contents of all the input JSON files. |
ln
pipeline.util.ln(files, outdir)Creates symlinks for files to an output directory.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| files | list[str] | List of filenames. | required |
| outdir | str | Destination or output directory to create symlinks. | required |
md5sum
pipeline.util.md5sum(filename, first_block_only=False, blocksize=65536)Gets md5checksum of a file in memory-safe manner. The file is read in blocks/chunks defined by the blocksize parameter. This is a safer option to reading the entire file into memory if the file is very large.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| filename | str | Input file on local filesystem to find md5 checksum. | required |
| first_block_only | bool | Calculate md5 checksum of the first block/chunk only. | False |
| blocksize | int | Blocksize of reading N chunks of data to reduce memory profile. | 65536 |
Returns
| Name | Type | Description |
|---|---|---|
| str | MD5 checksum of the file’s contents. |
permissions
pipeline.util.permissions(parser, path, *args, **kwargs)Checks permissions using os.access() to see if the user is authorized to access a file/directory. *args & **kwargs are passed to os.access().
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| parser | argparse.ArgumentParser | Argparse parser object. | required |
| path | str | Name of the path to check. | required |
Returns
| Name | Type | Description |
|---|---|---|
| str | Returns absolute path if it exists and permissions are correct. |
read_config_yml
pipeline.util.read_config_yml(file)Reads a YAML configuration file and returns its contents as a dictionary.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| file | str | The path to the YAML file to be read. | required |
Returns
| Name | Type | Description |
|---|---|---|
| dict | The contents of the YAML file as a dictionary. |
rename
pipeline.util.rename(filename)Dynamically renames FastQ file to have one of the following extensions: .R1.fastq.gz, .R2.fastq.gz To automatically rename the fastq files, a few assumptions are made. If the extension of the FastQ file cannot be inferred, an exception is raised telling the user to fix the filename of the fastq files.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| filename | str | Original name of file to be renamed. | required |
Returns
| Name | Type | Description |
|---|---|---|
| str | A renamed FastQ filename. |
require
pipeline.util.require(cmds, suggestions, path=None)Enforces an executable is in $PATH
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| cmds | list[str] | List of executable names to check. | required |
| suggestions | list[str] | Name of module to suggest loading for a given index in cmds. | required |
| path | list[str] | Optional list of PATHs to check. Defaults to $PATH. | None |
safe_copy
pipeline.util.safe_copy(source, target, resources=[])Private function: Given a list paths it will recursively copy each to the target location. If a target path already exists, it will NOT over-write the existing paths data.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| resources | list[str] | List of paths to copy over to target location. | [] |
| source | str | Add a prefix PATH to each resource. | required |
| target | str | Target path to copy templates and required resources. | required |
standard_input
pipeline.util.standard_input(parser, path, *args, **kwargs)Checks for standard input when provided or permissions using permissions().
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| parser | argparse.ArgumentParser | Argparse parser object. | required |
| path | str | Name of the path to check. | required |
Returns
| Name | Type | Description |
|---|---|---|
| str | If path exists and user can read from location. |
which
pipeline.util.which(cmd, path=None)Checks if an executable is in $PATH
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| cmd | str | Name of the executable to check. | required |
| path | list | Optional list of PATHs to check. Defaults to $PATH. | None |
Returns: bool: True if the executable is in PATH, False otherwise.
write_config_yml
pipeline.util.write_config_yml(_config, file)Writes a configuration dictionary to a YAML file.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| _config | dict | The configuration dictionary to write to the file. | required |
| file | str | The path to the file where the configuration will be written. | required |