Glossary ======== .. glossary:: :sorted: Apptainer Container runtime used by Sequana pipelines (formerly known as Singularity). Images are pulled on demand via ``--use-apptainer``. BAI Index file accompanying a :term:`BAM` file (non-standard extension ``.bai``). BAM Binary version of the :term:`SAM` alignment format. BED Tab-separated format describing genomic intervals (chrom, start, end, plus optional fields). Used for coverage reporting. CIGAR Compact string describing how a read aligns to a reference (matches, insertions, deletions, soft-clips, …). Conda environment Isolated Python + binary environment managed by ``conda``/``mamba``. Recommended for installing Sequana to avoid clashing with system packages. DEG Differentially Expressed Gene. Output of RNA-seq differential expression analyses (see :class:`sequana.rnadiff`). DSRC A compression tool dedicated to FastQ files. FASTA Plain-text format for nucleotide or protein sequences. Each record starts with a ``>`` header line, followed by one or more sequence lines. FASTQ Plain-text format combining a sequence with per-base quality scores. Typically gzipped (``.fastq.gz``). GFF General Feature Format. Tab-separated annotation file describing genes, exons and other features along a sequence. See also GFF3. GTF Gene Transfer Format. Tab-separated annotation similar to :term:`GFF` but with stricter attribute conventions. JSON Human-readable data serialisation language commonly used for configuration. See https://en.wikipedia.org/wiki/JSON. k-mer Substring of fixed length *k* from a sequence. Used in classification (Kraken), assembly (de Bruijn graphs), and quality control. MultiQC Aggregates outputs of many bioinformatics tools (FastQC, samtools, cutadapt …) into a single interactive HTML report. Sequana ships plugins for several of its pipelines. Module A directory that contains a Snakemake rule and an associated README. Especially relevant for the Sequana pipelines. See :ref:`developers`. NGS Next-generation sequencing. Catch-all term for high-throughput sequencing technologies (Illumina, PacBio, ONT …). Rule Smallest unit of a Snakemake workflow: declares input, output, and the shell/Python code that turns one into the other. SAM Sequence Alignment Map. Tab-separated format describing read alignments to a reference. Sample sheet Tabular description of an Illumina run (sample IDs, indices, adapters). Parsed by :class:`sequana.iem.SampleSheet`. Snakefile A file that contains one or several Snakemake rules. Snakemake Python-based workflow engine used by every Sequana pipeline. Taxon A taxonomic unit (species, genus, …) referenced by an NCBI taxid. Sequana's :mod:`sequana.taxonomy` loads the NCBI taxonomy dump. VCF Variant Call Format, used by the variant calling pipeline. Wrapper Reusable Snakemake rule shipped by `sequana-wrappers `_ and consumed by every pipeline via Snakemake's native ``wrapper:`` directive. YAML Human-readable data serialisation language commonly used for configuration files. Every Sequana pipeline configuration is YAML.