1. Installation

1.1. Installation using Conda

If you have not installed Sequana, be aware that it relies on many dependencies that needs to be compiled (i.e., it is time consumming and requires proper C compilator). For example, we use Matplotlib, Pandas that requires compilation. Besides, many pipelines rely on third-party software such as BWA or samtools that are not Python libraries. Yet, using conda, this process is simplified.

1.1.1. Install conda executable

In practice, we do use Anaconda . We recommend to install conda executable via the manual installer (download <https//continuum.io/downloads>_). You may have the choice between Python 2 and 3. We recommend to choose a Python version 3.

1.1.2. Add conda channels

When you want to install a new package, you have to use this type of syntax:

conda install ipython

where ipython is the package you wish to install. Note that by default, conda looks on the official Anaconda website (channel). However, there are many channels available. We will use the bioconda channel. To use it, type these commands (once for all):

conda config --add channels conda-forge
conda config --add channels defaults
conda config --add channels r
conda config --add channels bioconda

Warning

it is important to add them in this order, as mentionned on bioconda webpage (https://bioconda.github.io/).

1.1.3. Create an environment

Once conda is installed, open a new shell. Although this is not required strictly speaking, we would recomment to create an environment dedicated to Sequana. This environment can later be removed without affecting your system or conda installation. A conda environment is nothing else than a directory and can be created as follows:

conda create --name sequana_env python=3.5

Then, since you may have several environments, you must activate the sequana environment itself:

source activate sequana_env

1.1.4. Install sequana via conda (bioconda)

Finally, just type:

conda install sequana

This should install most of the required dependencies. However, you may need to install more packages depending on the pipeline used.

Here are some compulsary packages:

conda install numpy matplotlib pandas snakemake graphviz pygraphviz scipy

Then, depending on the pipelines or standalone applications you want to use, you will need to install other packages. Here is a list of dependencies that should be enough to run most of the current pipelines (commands are split on several lines but you can also install everything in one go):

conda install pysam snpeff biokit bioservices spades khmer pyVCF
conda install bwa bcftools samtools bedtools picard freebayes fastqc
conda install kraken krona pigz
conda install ipython cutadapt jupyter pbr

For atropos, which is not yet on bioconda, use the pip command:

pip atropos==1.0.23

Note

atropos is an alternative to cutadapt with additional options but same type of functionalties and arguments. We use version 1.0.23 and above though.

Note

the denovo_assembly pipelines uses Quast tool, which we ported to python 3.5 and was pulled on Quast official github page. This is not yet in bioconda but one can it from the quast github (sept 2016). This is required for the de-novo pipeline. The denove pipeline also requires GATK, to be installed manually by users (due to licensing restrictions)

Note

Sequana is not fully compatible with Python 2.7 since a dependency (Snakemake) is only available for Python 3.5. However, many core functionalities would work under Python 2.7

Note

For GATK (variant caller), please go to https://software.broadinstitute.org/gatk/download/auth?package=GATK and download the file GenomeAnalysisTK-3.7.tar.bz2 ; then type:

gatk-register GenomeAnalysisTK-3.7.tar.bz2

1.2. Docker containers for Sequana

Warning

Although we provide a Docker recipes, this method will not be maintained after release 0.3.0 of Sequana. However, we keep the version 0.3 and the following recipes here below for book-keeping and those willing to build their own docker of Sequana

Docker containers wrap a piece of software in a complete filesystem that contains everything needed to run the software.

In order to allow anyone to use Sequana without needs for complex installation, we provide Docker images, which are synchronized on the master branch of the source code.

We assume that:

  1. You have installed Docker on your system (see Docker otherwise).
  2. You have an account on Hub Docker .

1.2.1. Quick start

With your hub.docker account, first login:

docker login

Then download (pull) a Sequana image (all library, pipelines and standalones) as follows (2Gb image in total):

docker pull sequana/sequana

Now, you should be ready to try it. To start an interactive session, type:

cd <Directory_with_data>
docker run -v $PWD:/home/sequana/data -it sequana/sequana

1.2.2. Standalone

The primary goal of the docker is to make it possible to quickly test the standalones. For now, we expose only one docker. Please see specific documentation following the links here below:

1.2.3. More advanced Usage

Here below, we provide a quick tutorial that will guide you on using Sequana thanks to the docker. To do so, we will focus on one standalone application called sequana_coverage. In brief, the standalone takes as input a BED file that contains the genome coverage of a set of mapped DNA reads onto a reference genome. Then, the standalone creates a report with relevant information about the coverage (See Sequana documentation for more information).

1.2.4. Use the sequana Docker image

Once you downloaded the sequana image, you can then enter into the image as follows:

docker run -it sequana/sequana

This opens an interactive shell with latest sequana library pre-installed. For instance, you can start an IPython shell:

ipython

and import the library:

import sequana

Or within the unix shell, you can use standalones. For instance there is a test BED file that can be analysed as follows to get a coverage report:

sequana_coverage --input virus.bed

This should print information and create a report/ directory. This is not very practical if you have your own files or want to open the HTML page stored in ./report. So, let us quit the docker:

exit

and do it the proper way. Go to a working directory (or your computer )and start the docker image again as follows:

docker run -v $PWD:/home/sequana/data -it sequana/sequana

This should start the docker image again but you should now have a ./data directory. Be aware that if you modify data here (in the image), you will also modify the data in your local data file.

Now, you can run sequana_coverage in this directory:

cd data
sequana_coverage --input yourfile.bed

This analyses the data and creates a report/ directory. The container has no display but you can now go back to your computer in /home/user/mydatapath and browse the HTML page that was created.

Each time, we entered in the image but you can also use the images as executables (see standalone section above).

1.2.5. For developers:

Build the image:

git clone https://github.com/sequana/sequana
cd sequana/docker/sequana_core
sudo docker  build  -t="sequana/sequana_core" .

Run the image:

sudo docker run -it sequana/sequana_core

1.2.5.1. Layers

Here are the layers made available on hub.docker.com/u/sequana organizations. Each layer is built on top of the previous one

1.2.5.2. Sudo

To avoid using sudo, check out various forum. See for example: http://askubuntu.com/questions/477551/how-can-i-use-docker-without-sudo