4.8. pacbio denovo

Overview:denovo with canu dedicated to pacbio raw data set followed by quality assessment
Input:*.fasta files to be found in a directory. This is not the BAM file. To convert the BAM to fasta, you may use converter from biokit.
Output:assembly
Status:in progress

4.8.1. Usage

For now, please use Sequanix Tutorial interface.

4.8.2. Requirements

  • canu
  • busco
https://raw.githubusercontent.com/sequana/sequana/master/sequana/pipelines/pacbio_denovo/dag.png

4.8.3. Details

  • Canu step:
  • Busco step:

you would need to download the datasets required by BUSCO. The data is stored in your HOME in the .config/sequana/busco directory. To download and uncompress the data automatically, you can use:

from sequana import busco
busco.BuscoDownload().download()

4.8.4. Rules and configuration details

Here is a documented configuration file ../sequana/pipelines/pacbio_denovo/config.yaml to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file. In the pacbio_denovo pipeline, we use the canu, busco rules described here below.

4.8.4.1. Canu

CANU assembly

Required input:
  • __canu__input: input fasta files (1 per {sample})
Required output:
  • __canu__output: CANU creates lots of output. What we do is to check for the presence of {sample}.contigs.fasta
Required parameters:
  • __canu__workdir: where to save the results of the analysis
  • __canu_prefix: prefix used for the names. typically the {sample}
Required log:
  • __canu__log: where to save the stderr
Required configuration:
# Note that genomeSize can be in Giga (g) Mega (m), Kilo (k),
canu:
    genomeSize: 4.1m
    threads: 4
    techno: -pacbio-raw
    options: ""

4.8.4.2. Busco

Busco wrapper

Required input:
  • __busco__input: the output of an assembly analysis (e.g. from canu)
Required output:
  • __busco__output: lots of output generated. We look for the full_table_{sample}.tsv file
Required parameters:
  • __busco__workdir: where to save the results of the analysis
Required log:
  • __busco__log: where to save the stderr
Required configuration:
#
busco:
    mode_choice: genome, transcriptomics, proteins
    species: name of a BUSCO dataset. (use Sequanix to get the list)
    options: any options understood by busco

Internally, we set the environmental variable, create a config.ini file populated with conda environmental variable.

Temporary directory created by busco is deleted manually.