Glossary#

Apptainer#

Container runtime used by Sequana pipelines (formerly known as Singularity). Images are pulled on demand via --use-apptainer.

BAI#

Index file accompanying a BAM file (non-standard extension .bai).

BAM#

Binary version of the SAM alignment format.

BED#

Tab-separated format describing genomic intervals (chrom, start, end, plus optional fields). Used for coverage reporting.

CIGAR#

Compact string describing how a read aligns to a reference (matches, insertions, deletions, soft-clips, …).

Conda environment#

Isolated Python + binary environment managed by conda/mamba. Recommended for installing Sequana to avoid clashing with system packages.

DEG#

Differentially Expressed Gene. Output of RNA-seq differential expression analyses (see sequana.rnadiff).

DSRC#

A compression tool dedicated to FastQ files.

FASTA#

Plain-text format for nucleotide or protein sequences. Each record starts with a > header line, followed by one or more sequence lines.

FASTQ#

Plain-text format combining a sequence with per-base quality scores. Typically gzipped (.fastq.gz).

GFF#

General Feature Format. Tab-separated annotation file describing genes, exons and other features along a sequence. See also GFF3.

GTF#

Gene Transfer Format. Tab-separated annotation similar to GFF but with stricter attribute conventions.

JSON#

Human-readable data serialisation language commonly used for configuration. See https://en.wikipedia.org/wiki/JSON.

k-mer#

Substring of fixed length k from a sequence. Used in classification (Kraken), assembly (de Bruijn graphs), and quality control.

Module#

A directory that contains a Snakemake rule and an associated README. Especially relevant for the Sequana pipelines. See Developer guide.

MultiQC#

Aggregates outputs of many bioinformatics tools (FastQC, samtools, cutadapt …) into a single interactive HTML report. Sequana ships plugins for several of its pipelines.

NGS#

Next-generation sequencing. Catch-all term for high-throughput sequencing technologies (Illumina, PacBio, ONT …).

Rule#

Smallest unit of a Snakemake workflow: declares input, output, and the shell/Python code that turns one into the other.

SAM#

Sequence Alignment Map. Tab-separated format describing read alignments to a reference.

Sample sheet#

Tabular description of an Illumina run (sample IDs, indices, adapters). Parsed by sequana.iem.SampleSheet.

Snakefile#

A file that contains one or several Snakemake rules.

Snakemake#

Python-based workflow engine used by every Sequana pipeline.

Taxon#

A taxonomic unit (species, genus, …) referenced by an NCBI taxid. Sequana's sequana.taxonomy loads the NCBI taxonomy dump.

VCF#

Variant Call Format, used by the variant calling pipeline.

Wrapper#

Reusable Snakemake rule shipped by sequana-wrappers and consumed by every pipeline via Snakemake's native wrapper: directive.

YAML#

Human-readable data serialisation language commonly used for configuration files. Every Sequana pipeline configuration is YAML.