.. _tutorial:
Tutorial
==========
Hands-on examples for a few representative Sequana pipelines and standalones.
Each section assumes you have followed the :ref:`installation`. For a complete
list of pipelines and the matching install commands, see :ref:`pipelines`.
.. contents::
:local:
:depth: 2
The ``sequana`` CLI
-------------------
The top-level ``sequana`` command groups ~30 helper sub-commands (FASTQ/FASTA
utilities, GFF/GTF fixers, enrichment tools, summary, telomark, etc.). List
them with::
sequana --help
For shell completion (bash, zsh, fish), follow the
`Click instructions `_,
e.g. for bash::
_SEQUANA_COMPLETE=bash_source sequana > ~/.sequana-complete.bash
echo '. ~/.sequana-complete.bash' >> ~/.bashrc
fastqc pipeline
---------------
We will run the
`sequana_fastqc `_ pipeline on a pair of
FastQ files. The data is a Measles virus sequencing run (HiSeq, PCRFree
adapters, ~10% adapter content, index ``GTGAAA``). For testing, download:
- :download:`R1 <../sequana/resources/data/Hm2_GTGAAA_L005_R1_001.fastq.gz>`
- :download:`R2 <../sequana/resources/data/Hm2_GTGAAA_L005_R2_001.fastq.gz>`)
(1500 reads each.) Then::
pip install sequana_fastqc --upgrade
sequana_fastqc --input-directory .
cd fastqc
sh fastqc.sh
Open ``summary.html`` in your browser.
Taxonomy (standalone)
---------------------
Quick taxonomy classification of a FastQ file uses ``sequana_taxonomy``
(see :ref:`standalone_sequana_taxonomy`).
Download a toy Kraken database (100 FASTA files, measles + a few viruses)::
from sequana import KrakenDownload, sequana_config_path
kd = KrakenDownload()
kd.download("toydb")
database_path = sequana_config_path + "/kraken_toydb"
Then either drive it from Python::
from sequana import KrakenPipeline
kp = KrakenPipeline(["R1.fastq.gz", "R2.fastq.gz"],
database="~/.config/sequana/kraken_toydb")
kp.run()
…or from the shell::
sequana_taxonomy --file1 Test_R1.cutadapt.fastq.gz \
--file2 Test_R2.cutadapt.fastq.gz \
--database
Open ``taxonomy/kraken.html`` (Krona pie chart). A reference rendering is
available `here <_static/krona.html>`_.
Variant calling pipeline
------------------------
The variant calling pipeline performs mapping, calling and annotation
(snpEff + coverage). See :ref:`pipeline_vc` for full details.
Install::
pip install sequana_variant_calling --upgrade
Fetch a reference + GenBank annotation. Using the Measles accession
``K01711.1``::
from sequana.snpeff import download_fasta_and_genbank
download_fasta_and_genbank("K01711", "measles")
Then run::
sequana_variant_calling --input-directory . \
--reference measles.fa \
--annotation measles.gbk
cd variant_calling
sh variant_calling.sh
When the pipeline finishes, ``index.html`` shows the multiqc summary, and each
sample has its own ``report_SAMPLENAME/summary.html``.
About the configuration file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The reference and annotation paths are set when initiating the pipeline. Open
``config.yaml`` to tune the rest::
annotation_file: measles.gbk
reference_file: measles.fa
.. warning:: ``mark_duplicates`` outputs can be huge and require scratch space
on a cluster.
.. warning:: In the ``coverage`` section, reduce the window size for short
genomes.
De-novo assembly
----------------
::
pip install sequana_denovo --upgrade
sequana_denovo --input-directory . --working-directory denovo_test
Edit ``denovo_test/config.yaml`` before running. The
``digital_normalisation`` section is the main memory knob — for tests set
``max-tablesize`` to ``3e6``.
RNA-seq
-------
Full reference: :ref:`pipeline_rnaseq`. Quick recipe (single-end yeast data,
HiSeq2500)::
pip install sequana_rnaseq --upgrade
Get the genome + GFF for `Saccer3
`_::
mkdir Saccer3 && cd Saccer3
wget http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/chromFa.tar.gz
tar -xvzf chromFa.tar.gz
cat *.fa > Saccer3.fa
wget http://downloads.yeastgenome.org/curation/chromosomal_feature/saccharomyces_cerevisiae.gff -O Saccer3.gff
rm -f chr*
cd ..
.. warning:: All reference files must share the same prefix
(``Saccer3`` here) and live in the same directory.
.. warning:: The counting step expects GFF only (GTF/SAF coming).
Run::
sequana_rnaseq --genome-directory Saccer3 --aligner bowtie2
cd rnaseq
sh rnaseq.sh
On a SLURM cluster the same pipeline can be run with::
sbatch sh rnaseq.sh --profile slurm
(see the pipeline README for the cluster profile setup.)
Apptainer containers
--------------------
Every Sequana pipeline ships an ``apptainers.yaml`` that points at containers
maintained by the `damona `_ project. Pipelines
pull them automatically when invoked with ``--use-apptainer``::
sequana_fastqc --input-directory . --use-apptainer
This is the recommended way to avoid conda/system tool clashes.