Glossary#
- Apptainer#
Container runtime used by Sequana pipelines (formerly known as Singularity). Images are pulled on demand via
--use-apptainer.- BAI#
Index file accompanying a BAM file (non-standard extension
.bai).- BAM#
Binary version of the SAM alignment format.
- BED#
Tab-separated format describing genomic intervals (chrom, start, end, plus optional fields). Used for coverage reporting.
- CIGAR#
Compact string describing how a read aligns to a reference (matches, insertions, deletions, soft-clips, …).
- Conda environment#
Isolated Python + binary environment managed by
conda/mamba. Recommended for installing Sequana to avoid clashing with system packages.- DEG#
Differentially Expressed Gene. Output of RNA-seq differential expression analyses (see
sequana.rnadiff).- DSRC#
A compression tool dedicated to FastQ files.
- FASTA#
Plain-text format for nucleotide or protein sequences. Each record starts with a
>header line, followed by one or more sequence lines.- FASTQ#
Plain-text format combining a sequence with per-base quality scores. Typically gzipped (
.fastq.gz).- GFF#
General Feature Format. Tab-separated annotation file describing genes, exons and other features along a sequence. See also GFF3.
- GTF#
Gene Transfer Format. Tab-separated annotation similar to GFF but with stricter attribute conventions.
- JSON#
Human-readable data serialisation language commonly used for configuration. See https://en.wikipedia.org/wiki/JSON.
- k-mer#
Substring of fixed length k from a sequence. Used in classification (Kraken), assembly (de Bruijn graphs), and quality control.
- Module#
A directory that contains a Snakemake rule and an associated README. Especially relevant for the Sequana pipelines. See Developer guide.
- MultiQC#
Aggregates outputs of many bioinformatics tools (FastQC, samtools, cutadapt …) into a single interactive HTML report. Sequana ships plugins for several of its pipelines.
- NGS#
Next-generation sequencing. Catch-all term for high-throughput sequencing technologies (Illumina, PacBio, ONT …).
- Rule#
Smallest unit of a Snakemake workflow: declares input, output, and the shell/Python code that turns one into the other.
- SAM#
Sequence Alignment Map. Tab-separated format describing read alignments to a reference.
- Sample sheet#
Tabular description of an Illumina run (sample IDs, indices, adapters). Parsed by
sequana.iem.SampleSheet.- Snakefile#
A file that contains one or several Snakemake rules.
- Snakemake#
Python-based workflow engine used by every Sequana pipeline.
- Taxon#
A taxonomic unit (species, genus, …) referenced by an NCBI taxid. Sequana's
sequana.taxonomyloads the NCBI taxonomy dump.- VCF#
Variant Call Format, used by the variant calling pipeline.
- Wrapper#
Reusable Snakemake rule shipped by sequana-wrappers and consumed by every pipeline via Snakemake's native
wrapper:directive.- YAML#
Human-readable data serialisation language commonly used for configuration files. Every Sequana pipeline configuration is YAML.