Pipeline user guide#
This guide walks through running a Sequana pipeline end-to-end: install, configure, run, inspect. The same skeleton applies to every pipeline.
Install a pipeline#
Each pipeline is an independent PyPI package. Install it inside a Python 3.10+ environment alongside Sequana:
pip install sequana_fastqc --upgrade
Verify:
sequana_fastqc --help
Initialise the working directory#
Pipelines come with a config file, a snakefile, and a runner script. The initialisation command copies them into a working directory of your choice (default: the pipeline name):
sequana_fastqc --input-directory my_data --working-directory test1
cd test1
Inside test1 you will typically find:
config.yaml— pipeline parameters (input/output, tools, resources).<name>.rules— the Snakefile.<name>.sh— convenience launcher (snakemake with sensible defaults).apptainers.yaml— container URIs used when--use-apptaineris set.
Common CLI options#
Every Sequana pipeline understands these options (with sensible defaults):
--input-directoryWhere to look for input FASTQ files. Default:
..--input-patternGlob to select files. Default:
*fastq.gz. Use*/*fastq.gzif samples sit in sub-directories.--input-readtagPattern used to detect paired-end reads. Default:
_R[12]_.--working-directoryWhere the pipeline files get copied. Use
--forceto overwrite.--run-mode {local,slurm}Run locally or generate a SLURM-aware launcher. Auto-detected when
sbatchis on the path.--use-apptainerPull and execute every rule inside the matching apptainer image.
--depsPrint external dependencies and check whether they are installed.
Use sequana_<name> --help to discover pipeline-specific flags.
Edit the configuration#
config.yaml is plain YAML. The most common fields are at the top:
input_directory: /abs/path/to/data
input_readtag: _R[12]_
input_pattern: '*fastq.gz'
Every tool used by the pipeline has its own section (cutadapt:,
bwa:, coverage: …). The defaults are tuned for typical datasets; tweak
them for unusual cases (short genomes, very deep coverage, single-end data,
…).
Run the pipeline#
Two equivalent ways:
sh <pipeline>.sh
or directly through snakemake:
snakemake -s <pipeline>.rules -j 4 -p
-j N sets the number of parallel jobs, -p prints shell commands.
On a SLURM cluster, the generated <pipeline>.sh already includes the
cluster profile.
When the run is complete, the HTML report is at:
./summary.html
Clean up#
To remove temporary files but keep the report:
make clean
Tips#
Always run
--depsonce after installing a new pipeline.Re-run with
--forceto overwrite an existing working directory.For long runs on a cluster, prefer
--use-apptainer— it pins the tool versions and eliminates conda-env clashes.See Pipelines for the full pipeline catalogue and Tutorial for end-to-end examples.