4.7. Quality control Pacbio

Overview:Quality control for Pacbio bam data (raw data)
Input:A set of bam files
Output:report_{sample}.html where sample is the name of an input file

4.7.1. Usage

First copy the BAM files into a directory. Start Sequanix in that directory. Input BAM files must have the extension .bam.

Then, start sequanix as follows:

sequanix -w analysis -i . -p pacbio_qc

In the sequana/input file panel, just set the input directory where BAM files are stored. Note that currently all BAM files are taken into account irrespective of the pattern provided in Sequanix because the pipeline has hard-coded pattern to select all BAM files

In the configuration tab, in the kraken section add as many databases as you wish.

Save the project and press Run. Once done, open the HTML report for the BAM of interest.

4.7.2. Requirements

https://raw.githubusercontent.com/sequana/sequana/master/sequana/pipelines/pacbio_qc/dag.png

4.7.3. Details

This pipeline takes as inputs a set of BAM files from Pacbio sequencers. It computes a set of basic statistics related to the read lengths. It also shows some histograms related to the GC content, SNR of the diodes and the so-called ZMW values. Finally, a quick taxonomy can be performed using Kraken. HTML reports are created for each sample.

4.7.4. Rules and configuration details

Here is a documented configuration file ../sequana/pipelines/pacbio_qc/config.yaml to be used with the pipeline.

4.7.4.1. bam_to_fasta

BAM to Fasta conversion for Pacbio BAM file

Required input:
  • __bam_to_fasta__input_bam
Required output:
  • __bam_to_fasta__output_fasta
References:
sequana.pacbio

4.7.4.2. pacbio_quality

Pacbio quality control

Required input:
  • __pacbio_quality__input : the input BAM file
Required output:
  • __pacbio_quality__output_summary : summary_{sample}.json

In addition to a summary file with basic statistics, this rules creates 5 images with basic histograms about the read lengths, the GC content, the ZMW information, the SNR of the A,C,G,T nucleotides, and a 2D histogram of GC versus read length

References:
sequana.pacbio