Glossary#

Apptainer#: Container runtime used by Sequana pipelines (formerly known as Singularity). Images are pulled on demand via --use-apptainer.
BAI#: Index file accompanying a BAM file (non-standard extension .bai).
BAM#: Binary version of the SAM alignment format.
BED#: Tab-separated format describing genomic intervals (chrom, start, end, plus optional fields). Used for coverage reporting.
CIGAR#: Compact string describing how a read aligns to a reference (matches, insertions, deletions, soft-clips, …).
Conda environment#: Isolated Python + binary environment managed by conda/mamba. Recommended for installing Sequana to avoid clashing with system packages.
DEG#: Differentially Expressed Gene. Output of RNA-seq differential expression analyses (see sequana.rnadiff).
DSRC#: A compression tool dedicated to FastQ files.
FASTA#: Plain-text format for nucleotide or protein sequences. Each record starts with a > header line, followed by one or more sequence lines.
FASTQ#: Plain-text format combining a sequence with per-base quality scores. Typically gzipped (.fastq.gz).
GFF#: General Feature Format. Tab-separated annotation file describing genes, exons and other features along a sequence. See also GFF3.
GTF#: Gene Transfer Format. Tab-separated annotation similar to GFF but with stricter attribute conventions.
JSON#: Human-readable data serialisation language commonly used for configuration. See https://en.wikipedia.org/wiki/JSON.
k-mer#: Substring of fixed length k from a sequence. Used in classification (Kraken), assembly (de Bruijn graphs), and quality control.
Module#: A directory that contains a Snakemake rule and an associated README. Especially relevant for the Sequana pipelines. See Developer guide.
MultiQC#: Aggregates outputs of many bioinformatics tools (FastQC, samtools, cutadapt …) into a single interactive HTML report. Sequana ships plugins for several of its pipelines.
NGS#: Next-generation sequencing. Catch-all term for high-throughput sequencing technologies (Illumina, PacBio, ONT …).
Rule#: Smallest unit of a Snakemake workflow: declares input, output, and the shell/Python code that turns one into the other.
SAM#: Sequence Alignment Map. Tab-separated format describing read alignments to a reference.
Sample sheet#: Tabular description of an Illumina run (sample IDs, indices, adapters). Parsed by sequana.iem.SampleSheet.
Snakefile#: A file that contains one or several Snakemake rules.
Snakemake#: Python-based workflow engine used by every Sequana pipeline.
Taxon#: A taxonomic unit (species, genus, …) referenced by an NCBI taxid. Sequana's sequana.taxonomy loads the NCBI taxonomy dump.
VCF#: Variant Call Format, used by the variant calling pipeline.
Wrapper#: Reusable Snakemake rule shipped by sequana-wrappers and consumed by every pipeline via Snakemake's native wrapper: directive.
YAML#: Human-readable data serialisation language commonly used for configuration files. Every Sequana pipeline configuration is YAML.