Sequana documentation

Current version: 0.16.11, Mar 18, 2024

SEQUANA

https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat) https://badge.fury.io/py/sequana.svg https://github.com/sequana/sequana/actions/workflows/main.yml/badge.svg?branch=main https://coveralls.io/repos/github/sequana/sequana/badge.svg?branch=main Documentation Status JOSS (journal of open source software) DOI Python 3.8 | 3.9 | 3.10 | 3.11 GitHub Issues https://img.shields.io/badge/code%20style-black-000000.svg
How to cite:

Citations are important for us to carry on developments. For Sequana library (including the pipelines), please use

Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352

For the genome coverage tool (sequana_coverage): Desvillechabrol et al, 2018: detection and characterization of genomic variations using running median and mixture models. GigaScience, 7(12), 2018. https://doi.org/10.1093/gigascience/giy110

For Sequanix: Desvillechabrol et al. Sequanix: A Dynamic Graphical Interface for Snakemake Workflows Bioinformatics, bty034, https://doi.org/10.1093/bioinformatics/bty034 Also available on bioRxiv (DOI: https://doi.org/10.1101/162701)

Sequana includes a set of pipelines related to NGS (new generation sequencing) including quality control, variant calling, coverage, taxonomy, transcriptomics. We also ship Sequanix, a graphical user interface for Snakemake pipelines.

Pipelines and tools available in the Sequana project

name/github

description

Latest Pypi version

Test passing

apptainers

sequana_pipetools

Create and Manage Sequana pipeline

https://badge.fury.io/py/sequana-pipetools.svg https://github.com/sequana/sequana_pipetools/actions/workflows/main.yml/badge.svg

Not required

sequana-wrappers

Set of wrappers to build pipelines

Not on pypi

https://github.com/sequana/sequana-wrappers/actions/workflows/main.yml/badge.svg

Not required

demultiplex

Demultiplex your raw data

https://badge.fury.io/py/sequana-demultiplex.svg https://github.com/sequana/demultiplex/actions/workflows/main.yml/badge.svg

License restriction

denovo

denovo sequencing data

https://badge.fury.io/py/sequana-denovo.svg https://github.com/sequana/denovo/actions/workflows/main.yml/badge.svg https://github.com/sequana/denovo/actions/workflows/apptainer.yml/badge.svg

fastqc

Get Sequencing Quality control

https://badge.fury.io/py/sequana-fastqc.svg https://github.com/sequana/fastqc/actions/workflows/main.yml/badge.svg https://github.com/sequana/fastqc/actions/workflows/apptainer.yml/badge.svg

LORA

Map sequences on target genome

https://badge.fury.io/py/sequana-lora.svg https://github.com/sequana/lora/actions/workflows/main.yml/badge.svg https://github.com/sequana/lora/actions/workflows/apptainer.yml/badge.svg

mapper

Map sequences on target genome

https://badge.fury.io/py/sequana-mapper.svg https://github.com/sequana/mapper/actions/workflows/main.yml/badge.svg https://github.com/sequana/mapper/actions/workflows/apptainer.yml/badge.svg

nanomerge

Merge barcoded (or unbarcoded) nanopore fastq and reporting

https://badge.fury.io/py/sequana-nanomerge.svg https://github.com/sequana/nanomerge/actions/workflows/main.yml/badge.svg https://github.com/sequana/nanomerge/actions/workflows/apptainer.yml/badge.svg

pacbio_qc

Pacbio quality control

https://badge.fury.io/py/sequana-pacbio-qc.svg https://github.com/sequana/pacbio_qc/actions/workflows/main.yml/badge.svg https://github.com/sequana/pacbio_qc/actions/workflows/apptainer.yml/badge.svg

ribofinder

Find ribosomal content

https://badge.fury.io/py/sequana-ribofinder.svg https://github.com/sequana/ribofinder/actions/workflows/main.yml/badge.svg https://github.com/sequana/ribofinder/actions/workflows/apptainer.yml/badge.svg

rnaseq

RNA-seq analysis

https://badge.fury.io/py/sequana-rnaseq.svg https://github.com/sequana/rnaseq/actions/workflows/main.yml/badge.svg https://github.com/sequana/rnaseq/actions/workflows/apptainer.yml/badge.svg

variant_calling

Variant Calling

https://badge.fury.io/py/sequana-variant-calling.svg https://github.com/sequana/variant_calling/actions/workflows/main.yml/badge.svg https://github.com/sequana/variant_calling/actions/workflows/apptainer.yml/badge.svg

multicov

Coverage (mapping)

https://badge.fury.io/py/sequana-multicov.svg https://github.com/sequana/multicov/actions/workflows/main.yml/badge.svg https://github.com/sequana/coverage/actions/workflows/apptainer.yml/badge.svg

laa

Long read Amplicon Analysis

https://badge.fury.io/py/sequana-laa.svg https://github.com/sequana/laa/actions/workflows/main.yml/badge.svg https://github.com/sequana/laa/actions/workflows/apptainer.yml/badge.svg

revcomp

reverse complement of sequence data

https://badge.fury.io/py/sequana-revcomp.svg https://github.com/sequana/revcomp/actions/workflows/main.yml/badge.svg https://github.com/sequana/revcomp/actions/workflows/apptainer.yml/badge.svg

downsampling

downsample sequencing data

https://badge.fury.io/py/sequana-downsampling.svg https://github.com/sequana/downsampling/actions/workflows/main.yml/badge.svg

Not required

depletion

remove/select reads mapping a reference

https://badge.fury.io/py/sequana-downsampling.svg https://github.com/sequana/depletion/actions/workflows/main.yml/badge.svg
Pipelines not yet released

name/github

description

Latest Pypi version

Test passing

trf

Find repeats

https://badge.fury.io/py/sequana-trf.svg https://github.com/sequana/trf/actions/workflows/main.yml/badge.svg

multitax

Taxonomy analysis

https://badge.fury.io/py/sequana-multitax.svg https://github.com/sequana/multitax/actions/workflows/main.yml/badge.svg

Please see the documentation for an up-to-date status and documentation.

Contributors

Maintaining Sequana would not have been possible without users and contributors. Each contribution has been an encouragement to pursue this project. Thanks to all:

https://contrib.rocks/image?repo=sequana/sequana

Changelog

Version

Description

0.17.0

  • viz submodules: remove easydev and cleanup scipy imports

  • remove the substractor utility (use sequana_depletion pipeline instead)

  • remove get_max_gc_correlation function from bedtools. not used.

  • Major change in VCF reader (freebayes). Got rid of freebayes_bcf_filter redundant with freebayes_vcf_filter; replace scipy fisher test with own implementation. Remove useless VCF code.

  • Fixes rnadiff HTML report

  • speedup kegg enrichment using multiprocess

  • Allow sequana_taxonomy to download toydb and viruses_masking DBs from zenodo

0.16.9

  • Major fix on PCA and add batch effect plots in RNAdiff analysis

  • count matrix and DESeq2 output files' headers fixed with missing index (no impact on analysis but only for those willing to use the CSV files in excel)

  • Taxonomy revisited to save taxonomy.dat in gzipped CSV format.

0.16.8

  • update IEM for more testing

  • better handling of error in RNADiff

  • Add new methods for ribodesigner

0.16.7

  • Stable release (fix doc), deprecated.

0.16.6

  • Refactor IEM to make it more robust with more tests.

0.16.5

  • refactor to use pyproject instead of setuptools

  • remove pkg_resources (future deprecation)

  • remove unused requirements (cookiecutter, adjusttext, docutuils, mock, psutil, pykwalify)

  • cleanup resources (e.g. moving canvas/bar.py into viz)

0.16.4

  • hot fixes on RNAdiff reports and enrichments

0.16.3

0.16.2

  • save coverage PNG image (regression)

  • Update taxonomy/coverage standalone (regression) and more tests

0.16.1

  • hotfix missing module

0.16.0

  • add mpileup module

  • homogenization enrichment + fixup rnadiff

  • Complete refactoring of sequana coverage module. Allow sequana_coverage to handle small eukaryotes in a more memory efficient way.

  • use click for the sequana_taxonomy and sequana_coverage and sequana rnadiff command

  • Small fixup on homer, idr and phantom modules (for chipseq pipeline)

0.15.4

  • add plot for rnaseq/rnadiff

0.15.3

  • add sequana.viz.plotly module. use tqdm in bamtools module

  • KEGG API changed. We update sequana to use headless server and keep the feature of annotated and colored pathway.

  • Various improvements on KEGG enrichment including saving pathways, addition --comparison option in sequana sub-command, plotly plots, etc

0.15.2

  • ribodesigner can now accept an input fasta with no GFF assuming the fasta already contains the rRNA sequences

  • Fix IEM module when dealing with double indexing

  • Fix anchors in HTML reports (rnadiff module)

  • refactorise compare module to take several rnadiff results as input

  • enrichment improvements (export KEGG and GO as csv files

0.15.1

  • Fix creation of images directory in modules report

  • add missing test related to gff

  • Fix #804

0.15.0

  • add logo in reports

  • RNADiff reports can now use shrinkage or not (optional)

  • remove useless rules now in sequana-wrappers

  • update main README to add LORA in list of pipelines

  • Log2FC values are now shrinked log2FC values in volcano plot and report table. "NotShrinked" columns for Log2FC and Log2FCSE prior shrinkage are displayed in report table.

0.14.6

  • add fasta_and_gff_annotation module to correct fasta and gff given a vcf file.

  • add macs3 module to read output of macs3 peak detector.

  • add idr module to read results of idr analysis

  • add phantom module to compute phantom peaks

  • add homer module to read annotation files from annotatePeaks

0.14.5

0.14.4

  • hotfix bug on kegg colorised pathways

  • Fix the hover_name in rnadiff volcano plot to include the index/attribute.

  • pin snakemake to be >=7.16

0.14.3

  • new fisher metric in variant calling

  • ability to use several feature in rnaseq/rnadiff

  • pin several libaries due to regression during installs

0.14.2

  • Update ribodesigner

0.14.1

  • Kegg enrichment: add gene list 'all' and fix incomplete annotation case

  • New uniprot module for GO term enrichment and enrichment refactorisation (transparent for users)

0.14.0

  • pinned click>=8.1.0 due to API change (autocomplete)

  • moved tests around to decrease packaging from 16 to 4Mb

  • ribodesigner: new plots, clustering and notebook

0.13.X

  • Remove useless standalones or moved to main sequana command

  • Move sequana_lane_merging into a subcommand (sequana lane_merging)

  • General cleanup of documentation, test and links to pipelines

  • add new ribodesigner subcommand

0.12.X

What is Sequana ?

Sequana is a versatile tool that provides

  1. A Python library dedicated to NGS analysis (e.g., tools to visualise standard NGS formats).

  2. A set of pipelines dedicated to NGS in the form of Snakefiles (Makefile-like with Python syntax based on snakemake framework).

  3. Original tools to help in the creation of such pipelines including HTML reports.

  4. Standalone applications:
    1. sequana_coverage ease the extraction of genomic regions of interest and genome coverage information

    2. sequana_taxonomy performs a quick taxonomy of your FastQ. This requires dedicated databases to be downloaded.

    3. Sequanix: GUI for snakemake workflows, a GUI for Snakemake workflows (hence Sequana pipelines as well)

The sequana pipelines are various. Since March 2020, they have their own independent life within dedicated github repositories. You may find pipelines for NGS quality control (e.g. adapters removal, phix removal, trimming of bad quality bases), variant calling, characterisation of the genome coverage, taxonomic classification, de-novo assembly, Variant calling, RNA-seq, etc. See the Pipelines section for more information.

Sequana can be used by developers to create new pipelines and by users in the form of applications ready for production. Moreover, Sequanix can be used to set the parameters of pipelines and execute them easily with a graphical user interface.

To join the project, please let us know on github.

Installation

conda install sequana

Examples

Visit our example gallery to use the Python library

NGS pipelines

Learn about available Snakemake pipelines

Standalone applications

Standalone applications including Sequanix (GUI for snakemake) and the sequana_coverage tool.

User guide and reference

Indices and tables