4.4. Small RNA-seq

Overview:small RNASeq (miRNA)
Input:FastQ raw data from Illumina Sequencer (single end only)
Output:BAM, count and HTML files

4.4.1. Usage

Example:

sequana --pipeline smallrnaseq --input-dir .  --output-directory analysis --adapters TruSeq
cd analysis
srun snakemake -s smallrnaseq.rules --stats stats.txt -p -j 12 --nolock --cluster-config cluster_config.json --cluster "sbatch --mem={cluster.ram} --cpus-per-task={threads}" --restart-times 2

4.4.2. Requirements

  • cutadapt
  • picard-tools
  • bowtie
  • multiqc
  • fastq_screen
https://raw.githubusercontent.com/sequana/sequana/master/sequana/pipelines/smallrnaseq/dag.png

4.4.3. Details

This pipeline allows to map and count reads on mature and hairpin sequences (to download from miRBase) and perform some QC on data. All results are summarized by multiQC.

4.4.4. Rules and configuration details

Here is a documented configuration file ../sequana/pipelines/smallrnaseq/config.yaml to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file. Here are the rules and their developer and user documentation.

4.4.4.1. FastQC

Calls FastQC on input data sets (paired or not)

This rule is a dynamic rule. Meaning that it can be included in a pipeline with different names. For instance in the quality_control pipeline, it is used as fastqc_samples and fastqc_phix. Here below, the string %(name)s must be replaced by the appropriate dynamic name.

Required input:
  • __fastqc_%(name)s__input_fastq:
Required output:
  • __fastqc_%(name)s__output_done
Required parameters
  • __fastqc_%(name)s__wkdir: the working directory
Required configuration:
fastqc:
    options: "-nogroup"   # a string with fastqc options
References:

4.4.4.2. Fastq_screen

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Required input:
__fastq_screen__input: a output fastq_screen directory
Required output:
__fastq_screen__output: fastq_screen directory results
Config:
fastq_screen:
    conf:  # a valid path to a fastq_screen config file

4.4.4.3. Cutadapt

Cutadapt (adapter removal)

Required input:
  • __cutadapt__input_fastq
Required output:
  • __cutadapt__output
Required parameters:
  • __cutadapt__fwd: forward adapters as a file, or string
  • __cutadapt__rev: reverse adapters as a file, or string
  • __cutadapt__options,
  • __cutadapt__mode, # g for 5' adapter, a for 3' and b for both 5'/3' (see cutadapt doc for details)
  • __cutadapt__wkdir,
  • __cutadapt__design,
  • __cutadapt__design_adapter,
  • __cutadapt__sample
Other requirements:
  • __cutadapt__log
Required configuration:
cutadapt:
    do: yes
    tool_choice: cutadapt
    design: "%(adapter_design)s"
    adapter_choice: "%(adapter_type)s"
    fwd: "%(adapter_fwd)s"
    rev: "%(adapter_rev)s"
    m: 20   # cutoff
    mode: "g"   # g for 5' adapter, a for 3' and b for both 5'/3'
    quality: "30"
    options: "-O 6 --trim-n"
References:
http://cutadapt.readthedocs.io/en/stable/index.html

4.4.4.4. Bowtie1

no docstring found for bowtie1_mapping_dynamic

4.4.4.5. Counting

no docstring found for miRNA_count_dynamic

4.4.4.6. Reporting

MultiQC aggregates results from bioinformatics analyses across many samples into a single report.

It searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.

reference:http://multiqc.info/
Required input:
__multiqc__input_dir: an input directory where to find data and logs
Required output:
__multiqc__output: multiqc_report.html in the input directory

Config:

multiqc:
    excluded: "-x *.zip " Ignore analysis files (glob expression)
    output-directory:  " " #name of the output directory where to write results
note:if the directory exists, it is overwritten