spooker

spooker

SPOOKER 👻

This command is designed to be used as part of the OnComplete/OnSuccess/OnError handlers as part of Snakemake and Nextflow pipelines. It collects metadata about the pipeline run, bundles it into a tarball, and saves it to a common location for later retrieval.

Run spooker --help for more information.

See spooker for the main function

Functions

Name	Description
cli	spooker 👻
get_spooker_dict	Generates a metadata dictionary summarizing the state and logs of a pipeline run.
spooker	Processes a pipeline output directory to generate metadata, tree JSON, and SLURM job log JSON,

cli

spooker.cli(outdir, name, version, path, debug)

spooker 👻

get_spooker_dict

spooker.get_spooker_dict(
    pipeline_outdir,
    pipeline_name,
    pipeline_version,
    pipeline_path,
)

Generates a metadata dictionary summarizing the state and logs of a pipeline run.

Parameters

Name	Type	Description	Default
pipeline_outdir	pathlib.Path	Path to the pipeline output directory.	required
pipeline_name	str	Name of the pipeline.	required
pipeline_version	str	Version of the pipeline.	required
pipeline_path	str	Path to the pipeline definition or script.	required

Returns

Name	Type	Description
dict		A dictionary containing: - “outdir_tree”: String representation of the output directory tree. - “pipeline_metadata”: Metadata about the pipeline run. - “jobby”: JSON-formatted job log records. - “master_job_log”: Contents of the main job log file. - “failed_jobs”: Logs of failed jobs.

spooker

spooker.spooker(
    pipeline_outdir,
    pipeline_name,
    pipeline_version,
    pipeline_path,
    clean=True,
    debug=False,
)

Processes a pipeline output directory to generate metadata, tree JSON, and SLURM job log JSON, then stages the file on an HPC cluster.

Parameters

Name	Type	Description	Default
pipeline_outdir	pathlib.Path	Path to the pipeline output directory.	required
pipeline_version	str	Version of the pipeline being processed.	required
pipeline_name	str	Name of the pipeline being processed.	required
pipeline_path	str	Path to the pipeline source code or configuration.	required
clean	bool	Whether to delete the generated metadata file after staging. Defaults to True.	`True`
debug	bool	Whether to enable debug mode for the HPC cluster. Defaults to False.	`False`

Returns

Name	Type	Description
		pathlib.Path: Path to the staged metadata file on the HPC cluster.

Raises

Name	Type	Description
	FileNotFoundError	If the pipeline output directory does not exist.

Notes

The function collects metadata, generates a tree JSON representation of the pipeline directory, and extracts job log information.
The metadata is written to a compressed JSON file and staged on an HPC cluster.
If clean is True, the local metadata file is deleted after staging.