Tutorial#

Hands-on examples for a few representative Sequana pipelines and standalones. Each section assumes you have followed the Installation. For a complete list of pipelines and the matching install commands, see Pipelines.

The `sequana` CLI #

The top-level sequana command groups ~30 helper sub-commands (FASTQ/FASTA utilities, GFF/GTF fixers, enrichment tools, summary, telomark, etc.). List them with:

sequana --help

For shell completion (bash, zsh, fish), follow the Click instructions, e.g. for bash:

_SEQUANA_COMPLETE=bash_source sequana > ~/.sequana-complete.bash
echo '. ~/.sequana-complete.bash' >> ~/.bashrc

fastqc pipeline #

We will run the sequana_fastqc pipeline on a pair of FastQ files. The data is a Measles virus sequencing run (HiSeq, PCRFree adapters, ~10% adapter content, index GTGAAA). For testing, download:

R1
R2)

(1500 reads each.) Then:

pip install sequana_fastqc --upgrade
sequana_fastqc --input-directory .
cd fastqc
sh fastqc.sh

Open summary.html in your browser.

Taxonomy (standalone)#

Quick taxonomy classification of a FastQ file uses sequana_taxonomy (see sequana_taxonomy).

Download a toy Kraken database (100 FASTA files, measles + a few viruses):

from sequana import KrakenDownload, sequana_config_path
kd = KrakenDownload()
kd.download("toydb")
database_path = sequana_config_path + "/kraken_toydb"

Then either drive it from Python:

from sequana import KrakenPipeline
kp = KrakenPipeline(["R1.fastq.gz", "R2.fastq.gz"],
                    database="~/.config/sequana/kraken_toydb")
kp.run()

…or from the shell:

sequana_taxonomy --file1 Test_R1.cutadapt.fastq.gz \
                 --file2 Test_R2.cutadapt.fastq.gz \
                 --database <database_path>

Open taxonomy/kraken.html (Krona pie chart). A reference rendering is available here.

Variant calling pipeline #

The variant calling pipeline performs mapping, calling and annotation (snpEff + coverage). See Variant Calling for full details.

Install:

pip install sequana_variant_calling --upgrade

Fetch a reference + GenBank annotation. Using the Measles accession K01711.1:

from sequana.snpeff import download_fasta_and_genbank
download_fasta_and_genbank("K01711", "measles")

Then run:

sequana_variant_calling --input-directory . \
                        --reference measles.fa \
                        --annotation measles.gbk
cd variant_calling
sh variant_calling.sh

When the pipeline finishes, index.html shows the multiqc summary, and each sample has its own report_SAMPLENAME/summary.html.

About the configuration file #

The reference and annotation paths are set when initiating the pipeline. Open config.yaml to tune the rest:

annotation_file: measles.gbk
reference_file: measles.fa

Warning

mark_duplicates outputs can be huge and require scratch space on a cluster.

Warning

In the coverage section, reduce the window size for short genomes.

De-novo assembly #

pip install sequana_denovo --upgrade
sequana_denovo --input-directory . --working-directory denovo_test

Edit denovo_test/config.yaml before running. The digital_normalisation section is the main memory knob — for tests set max-tablesize to 3e6.

RNA-seq #

Full reference: RNA-seq. Quick recipe (single-end yeast data, HiSeq2500):

pip install sequana_rnaseq --upgrade

Get the genome + GFF for Saccer3:

mkdir Saccer3 && cd Saccer3
wget http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/chromFa.tar.gz
tar -xvzf chromFa.tar.gz
cat *.fa > Saccer3.fa
wget http://downloads.yeastgenome.org/curation/chromosomal_feature/saccharomyces_cerevisiae.gff -O Saccer3.gff
rm -f chr*
cd ..

Warning

All reference files must share the same prefix (Saccer3 here) and live in the same directory.

Warning

The counting step expects GFF only (GTF/SAF coming).

Run:

sequana_rnaseq --genome-directory Saccer3 --aligner bowtie2
cd rnaseq
sh rnaseq.sh

On a SLURM cluster the same pipeline can be run with:

sbatch sh rnaseq.sh --profile slurm

(see the pipeline README for the cluster profile setup.)

Apptainer containers #

Every Sequana pipeline ships an apptainers.yaml that points at containers maintained by the damona project. Pipelines pull them automatically when invoked with --use-apptainer:

sequana_fastqc --input-directory . --use-apptainer

This is the recommended way to avoid conda/system tool clashes.