1.1. Installation using Conda¶
If you have not installed Sequana, be aware that it relies on many dependencies that needs to be compiled (i.e., it is time consumming and requires proper C compilator). For example, we use Matplotlib, Pandas that requires compilation. Besides, many pipelines rely on third-party software such as BWA or samtools that are not Python libraries. Yet, using conda, this process is simplified.
1.1.1. Install conda executable¶
In practice, we do use Anaconda . We recommend to install conda executable via the manual installer (download <https//continuum.io/downloads>_). You may have the choice between Python 2 and 3. We recommend to choose a Python version 3.
1.1.2. Add conda channels¶
When you want to install a new package, you have to use this type of syntax:
conda install ipython
where ipython is the package you wish to install. Note that by default, conda looks on the official Anaconda website (channel). However, there are many channels available. We will use the bioconda channel. To use it, type these commands (once for all):
conda config --add channels conda-forge conda config --add channels defaults conda config --add channels r conda config --add channels bioconda
it is important to add them in this order, as mentionned on bioconda webpage (https://bioconda.github.io/).
1.1.3. Create an environment¶
Once conda is installed, open a new shell. Although this is not required strictly speaking, we would recomment to create an environment dedicated to Sequana. This environment can later be removed without affecting your system or conda installation. A conda environment is nothing else than a directory and can be created as follows:
conda create --name sequana_env python=3.5
Then, since you may have several environments, you must activate the sequana environment itself:
source activate sequana_env
1.1.4. Install sequana via conda (bioconda)¶
Finally, just type:
conda install sequana
This should install most of the required dependencies. However, you may need to install more packages depending on the pipeline used.
Here are some compulsary packages:
conda install numpy matplotlib pandas snakemake graphviz pygraphviz scipy
Then, depending on the pipelines or standalone applications you want to use, you will need to install other packages. Here is a list of dependencies that should be enough to run most of the current pipelines (commands are split on several lines but you can also install everything in one go):
conda install pysam snpeff biokit bioservices spades khmer pyVCF conda install bwa bcftools samtools bedtools picard freebayes fastqc conda install kraken krona pigz conda install ipython cutadapt jupyter pbr
For atropos, which is not yet on bioconda, use the pip command:
atropos is an alternative to cutadapt with additional options but same type of functionalties and arguments. We use version 1.0.23 and above though.
the denovo_assembly pipelines uses Quast tool, which we ported to python 3.5 and was pulled on Quast official github page. This is not yet in bioconda but one can it from the quast github (sept 2016). This is required for the de-novo pipeline. The denove pipeline also requires GATK, to be installed manually by users (due to licensing restrictions)
Sequana is not fully compatible with Python 2.7 since a dependency (Snakemake) is only available for Python 3.5. However, many core functionalities would work under Python 2.7
For GATK (variant caller), please go to https://software.broadinstitute.org/gatk/download/auth?package=GATK and download the file GenomeAnalysisTK-3.7.tar.bz2 ; then type:
1.2. Docker containers for Sequana¶
Although we provide a Docker recipes, this method will not be maintained after release 0.3.0 of Sequana. However, we keep the version 0.3 and the following recipes here below for book-keeping and those willing to build their own docker of Sequana
Docker containers wrap a piece of software in a complete filesystem that contains everything needed to run the software.
In order to allow anyone to use Sequana without needs for complex installation, we provide Docker images, which are synchronized on the master branch of the source code.
We assume that:
- You have installed Docker on your system (see Docker otherwise).
- You have an account on Hub Docker .
1.2.1. Quick start¶
With your hub.docker account, first login:
Then download (pull) a Sequana image (all library, pipelines and standalones) as follows (2Gb image in total):
docker pull sequana/sequana
Now, you should be ready to try it. To start an interactive session, type:
cd <Directory_with_data> docker run -v $PWD:/home/sequana/data -it sequana/sequana
The primary goal of the docker is to make it possible to quickly test the standalones. For now, we expose only one docker. Please see specific documentation following the links here below:
- sequana_coverage: (https://github.com/sequana/sequana/tree/master/docker/sequana_coverage)
- sequana_taxonomy: (https://github.com/sequana/sequana/tree/master/docker/sequana_taxonomy)
1.2.3. More advanced Usage¶
Here below, we provide a quick tutorial that will guide you on using Sequana thanks to the docker. To do so, we will focus on one standalone application called sequana_coverage. In brief, the standalone takes as input a BED file that contains the genome coverage of a set of mapped DNA reads onto a reference genome. Then, the standalone creates a report with relevant information about the coverage (See Sequana documentation for more information).
1.2.4. Use the sequana Docker image¶
Once you downloaded the sequana image, you can then enter into the image as follows:
docker run -it sequana/sequana
This opens an interactive shell with latest sequana library pre-installed. For instance, you can start an IPython shell:
and import the library:
Or within the unix shell, you can use standalones. For instance there is a test BED file that can be analysed as follows to get a coverage report:
sequana_coverage --input virus.bed
This should print information and create a report/ directory. This is not very practical if you have your own files or want to open the HTML page stored in ./report. So, let us quit the docker:
and do it the proper way. Go to a working directory (or your computer )and start the docker image again as follows:
docker run -v $PWD:/home/sequana/data -it sequana/sequana
This should start the docker image again but you should now have a ./data directory. Be aware that if you modify data here (in the image), you will also modify the data in your local data file.
Now, you can run sequana_coverage in this directory:
cd data sequana_coverage --input yourfile.bed
This analyses the data and creates a report/ directory. The container has no display but you can now go back to your computer in /home/user/mydatapath and browse the HTML page that was created.
Each time, we entered in the image but you can also use the images as executables (see standalone section above).
1.2.5. For developers:¶
Build the image:
git clone https://github.com/sequana/sequana cd sequana/docker/sequana_core sudo docker build -t="sequana/sequana_core" .
Run the image:
sudo docker run -it sequana/sequana_core
Here are the layers made available on hub.docker.com/u/sequana organizations. Each layer is built on top of the previous one
- sequana_core (only ubuntu + some packages)
- sequana_conda_core (sequana_core + conda + common scientific packages)
- sequana_conda_ngs (sequana_conda_core + NGS conda packages)
- sequana (sequana_conda_ngs + sequana specific version)
- Standalone Layers:
- sequana_coverage (sequana + sequana_coverage standalone)
To avoid using sudo, check out various forum. See for example: http://askubuntu.com/questions/477551/how-can-i-use-docker-without-sudo