4.4. Small RNA-seq

Overview:small RNASeq (miRNA)
Input:FastQ raw data from Illumina Sequencer (single end only)
Output:
Config file requirements:
 

4.4.1. Usage

Example:

sequana --pipeline smallrnaseq --input-dir .  --output-directory analysis --adapters TruSeq
cd analysis
srun snakemake -s smallrnaseq.rules --stats stats.txt -p -j 12 --nolock --cluster-config cluster_config.json --cluster "sbatch --mem={cluster.ram} --cpus-per-task={threads}" --restart-times 2

4.4.2. Requirements

  • cutadapt
  • picard-tools
  • bowtie
  • bowtie2
  • multiqc
  • STAR
  • fastq_screen
  • featureCounts [subread]
https://raw.githubusercontent.com/sequana/sequana/master/sequana/pipelines/smallrnaseq/dag.png

4.4.3. Details

4.4.4. Rules and configuration details

Here is a documented configuration file ../sequana/pipelines/smallrnaseq/config.yaml to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file. Here are the rules and their developer and user documentation.

4.4.4.1. FastQC

Calls FastQC on input data sets (paired or not)

This rule is a dynamic rule. Meaning that it can be included in a pipeline with different names. For instance in the quality_control pipeline, it is used as fastqc_samples and fastqc_phix. Here below, the string %(name)s must be replaced by the appropriate dynamic name.

Required input:
  • __fastqc_%(name)s__input_fastq:
Required output:
  • __fastqc_%(name)s__output_done
Required parameters
  • __fastqc_%(name)s__wkdir: the working directory
Required configuration:
fastqc:
    options: "-nogroup"   # a string with fastqc options
References:

4.4.4.2. Fastq_screen

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

Required input:
__fastq_screen__input: a output fastq_screen directory
Required output:
__fastq_screen__output: fastq_screen directory results
Config:
fastq_screen:
    conf:  # a valid path to a fastq_screen config file

4.4.4.3. Cutadapt

Cutadapt (adapter removal)

Required input:
  • __cutadapt__input_fastq
Required output:
  • __cutadapt__output
Required parameters:
  • __cutadapt__fwd: forward adapters as a file, or string
  • __cutadapt__rev: reverse adapters as a file, or string
  • __cutadapt__options,
  • __cutadapt__mode, # g for 5' adapter, a for 3' and b for both 5'/3' (see cutadapt doc for details)
  • __cutadapt__wkdir,
  • __cutadapt__design,
  • __cutadapt__design_adapter,
  • __cutadapt__sample
Other requirements:
  • __cutadapt__log
Required configuration:
adapter_removal:
    do: yes
    tool_choice: cutadapt
    design: "%(adapter_design)s"
    adapter_choice: "%(adapter_type)s"
    fwd: "%(adapter_fwd)s"
    rev: "%(adapter_rev)s"
    m: 20   # cutoff
    mode: "g"   # g for 5' adapter, a for 3' and b for both 5'/3'
    quality: "30"
    options: "-O 6 --trim-n"
References:
http://cutadapt.readthedocs.io/en/stable/index.html

4.4.4.4. Bowtie1

no docstring found for bowtie1_mapping_dynamic

4.4.4.5. Counting

no docstring found for miRNA_count_dynamic

4.4.4.6. Reporting

MultiQC aggregates results from bioinformatics analyses across many samples into a single report.

It searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.

reference:http://multiqc.info/
Required input:
__multiqc__input_dir: an input directory where to find data and logs
Required output:
__multiqc__output: multiqc_report.html in the input directory

Config:

multiqc:
    excluded: "-x *.zip " Ignore analysis files (glob expression)
    output-directory:  " " #name of the output directory where to write results
note:if the directory exists, it is overwritten