spooker

spooker

SPOOKER 👻

This command is designed to be used as part of the OnComplete/OnSuccess/OnError handlers as part of Snakemake and Nextflow pipelines. It collects metadata about the pipeline run, bundles it into a tarball, and saves it to a common location for later retrieval.

Run spooker --help for more information.

See spooker for the main function

Functions

Name Description
cli spooker 👻
get_spooker_dict Generates a metadata dictionary summarizing the state and logs of a pipeline run.
spooker Processes a pipeline output directory to generate metadata, tree JSON, and SLURM job log JSON,

cli

spooker.cli(outdir, name, version, path, debug)

spooker 👻

This command is designed to be used as part of the OnComplete/OnSuccess/OnError handlers as part of Snakemake and Nextflow pipelines. It collects metadata about the pipeline run, bundles it into a tarball, and saves it to a common location for later retrieval.

get_spooker_dict

spooker.get_spooker_dict(
    pipeline_outdir,
    pipeline_name,
    pipeline_version,
    pipeline_path,
)

Generates a metadata dictionary summarizing the state and logs of a pipeline run.

Parameters

Name Type Description Default
pipeline_outdir pathlib.Path Path to the pipeline output directory. required
pipeline_name str Name of the pipeline. required
pipeline_version str Version of the pipeline. required
pipeline_path str Path to the pipeline definition or script. required

Returns

Name Type Description
dict A dictionary containing: - “outdir_tree”: String representation of the output directory tree. - “pipeline_metadata”: Metadata about the pipeline run. - “jobby”: JSON-formatted job log records. - “master_job_log”: Contents of the main job log file. - “failed_jobs”: Logs of failed jobs.

spooker

spooker.spooker(
    pipeline_outdir,
    pipeline_name,
    pipeline_version,
    pipeline_path,
    clean=True,
    debug=False,
)

Processes a pipeline output directory to generate metadata, tree JSON, and SLURM job log JSON, then stages the file on an HPC cluster.

Parameters

Name Type Description Default
pipeline_outdir pathlib.Path Path to the pipeline output directory. required
pipeline_version str Version of the pipeline being processed. required
pipeline_name str Name of the pipeline being processed. required
pipeline_path str Path to the pipeline source code or configuration. required
clean bool Whether to delete the generated metadata file after staging. Defaults to True. True
debug bool Whether to enable debug mode for the HPC cluster. Defaults to False. False

Returns

Name Type Description
pathlib.Path: Path to the staged metadata file on the HPC cluster.

Raises

Name Type Description
FileNotFoundError If the pipeline output directory does not exist.

Notes

  • The function collects metadata, generates a tree JSON representation of the pipeline directory, and extracts job log information.
  • The metadata is written to a compressed JSON file and staged on an HPC cluster.
  • If clean is True, the local metadata file is deleted after staging.