10. Wrappers

As of August 2021, Sequana team created the e sequana wrappers repository, which is intended to replace the rules. The adavantage is that wrappers can be tested with a continuous integration.

Wrappers are used within a Snakemake rule. When you call your Snakemake pipeline, you will need to add:

--wrapper-prefix git+file:https://github.com/sequana/sequana-wrappers/

We provide documentation for each wrapper. It can be included in this documentation thanks to a sphinx extension. For example:

.. sequana_wrapper:: fastqc

Here is a non exhaustive list of documented wrappers.

10.1. bowtie2/align

The bowtie2/align wrapper Maps sequencing reads onto a reference using Bowtie2

This wrapper takes single or paired data and map them on a reference. The results file is a sorted and indexed BAM file.

Required input:

fastq: the input FastQ files (single or paired)

Required output::

bam :the output BAM file.
sorted :the sorted output BAM file. (optional but is computed)

Optional output (recommended):

sorted If set, the BAM file will also be sorted and indexing provided

Required parameters:

options: a list of valid fastqc options
index the expected index prefix of the genome reference. For instance if the reference is in ref/hg38.fa, most probably this parameter must be set to ref/hg38

Log:

a log file generated by bowtie2 is stored

Example

rule bowtie2:
    input:
        fastq=input_data
    output:
        bam="{sample}/bowtie2/{sample}.bam
        sorted="{sample}/bowtie2/{sample}.sorted.bam
    params:
        options = config['bowtie2'][options'],
        index="reference/mygenome"
    threads:
        config['bowtie2"]["threads"]
    log:
        "{sample}/bowtie2/{sample}.log"
    wrapper:
        "main/wrappers/bowtie2/align"

Configuration

#############################################################################
#  Bowtie2 read mapping
#
# :Parameters:
#
# - options: any options recognised by 'bowtie2 index' command
# - threads: number of threads to be used
bowtie2:
    options: ''
    threads: 4

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.2. bowtie2/build

The bowtie2/build wrapper builcs a genome index with Bowtie2

This wrapper takes a reference and build its index. The output files are stored in a directory extracted from the genome reference name.

Required input:

reference: the reference genome to index (FASTA format)

Required output:

a list of expected output files. Need at least one. From the first file, the prefix is extracted and used by bowtie2 as indexbase.

Required parameters

options: a list of valid fastqc options

Log:

a log file generated by bowtie2 is stored

Example

rule bowtie2_build:
    input:
        reference="genome.fa"
    output:
        "genome/bowtie2/genome.1.bt2"
    params:
        options = config['bowtie2_build'][options'],
    threads:
        config['bowtie2_build']["threads"]
    log:
        "logs/bowtie2_build/bowtie1_build.log"
    wrapper:
        "main/wrappers/bowtie2/build"

Configuration

#############################################################################
#   bowtie2 build index
#
# :Parameters:
#
# - options: any options recognised by 'bowtie2-build' command
# - threads:
bowtie2_build:
    options: ''
    threads: 4

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.3. add_read_group

The add_read_group wrapper This wrapper adds a read group in an input BAM file. several standard read tags are handled by this wrapper but any other can be added using the options parameter. See the note below for details.

Required input:

BAM file

Required output:

The first output must be the expected output BAM file with the read group added.
(optional): although not required, you may specify the indexed BAM file in the output as a second argument, which is generated by the wrapper. The index is built with samtools.

Log:

a log file with stdout/stderr

Notes:

If not provided, PL is set to 'Illumina', PU is set to 'unknown' and LB is set to 'unknown'. If not provided, SM is set to the input BAM file filename (without .bam extension). You can also use wildcards.

Finally, if any of those options are provided in the options field, they will replace the other fields.

Example

rule add_read_group:
    input:
        "test.bam"
    output:
        bam="test.rg.bam",
        bai="test.rg.bam.bai"
    log:
        "log.out"
    params:
        ID="1",
        LB="lib",
        PL="ILLUMINA",
        PU="unit",
        SM="test",
        options=""
    wrapper:
        "main/wrappers/add_read_group"

A simpler example:

rule add_read_group:

input:

"{sample}.bam"

output:

bam="{sample}.rg.bam",

log:

"log.out"

params:

SM="{sample}

wrapper:

"main/wrappers/add_read_group"

Configuration

add_read_group:
    options:    # result filters options
    # PL:   (automatically filled with Illumina)
    # PU:   (automatically filled with unknown)
    # LB:   (automatically filled with unknown)
    # SM:   (automatically filled with sample name from
             input file. e.g. file.sorted.bam returns 'file')
    # ID    (automatically set to a unique uuid)

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.4. bam_coverage

bam_coverage

docstring for bam_coverage wrapper not yet available (no README.md found)

10.5. bcl2fastq

The bcl2fastq wrapper This wrapper calls bcl2fastq software to convert BCL raw data into FastQ files

Required input:

SampleSheet as provided by sequencer providers

Required output:

Stats/Stats.json

Required parameters:

options: a list of valid bcl2fastq options
indir: place where to find BCL directory
ignore_missing_bcls: interpret missing BCL files as no call (N)
no-bgzf-compression: turn off BGZF compression for FASTQ files
barcode-mismatches: number of allowed mismatches per index
merge_all_lanes: if false, use the --no-lane-splitting option

Log:

a log file generated by bcl2fastq is stored

Example

rule bcl2fastq:
    input:
        samplesheet=config["bcl2fastq"]["SampleSheet.csv"]
    output:
        "Stats/Stats.json",
    params:
        indir="bcl",
        barcode_mismatch=config['bcl2fastq']['barcode_mismatch'],
        ignore_missing_bcls=config['bcl2fastq']['ignore_missing_bcls'],
        no_bgzf_compression=config['bcl2fastq']['no_bgzf_compression'],
        merge_all_lanes=config['bcl2fastq']['merge_all_lanes'],
        options=config['bcl2fastq']['options']
    threads: config['bcl2fastq']["threads"]
    wrapper:
        "main/wrappers/bcl2fastq"

Configuration

##############################################################################
# bcl2fastq section
#
# :Parameters:
#
# - options: string with any valid FastQC options
bcl2fastq:
    threads: 4
    barcode_mismatch: 0
    samplesheet_file: "SampleSheet.csv"
    ignore_missing_bcls: true
    no_bgzf_compression: true
    options: ''
    merge_all_lanes: true

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.6. blast

blast

docstring for blast wrapper not yet available (no README.md found)

10.7. busco

The busco wrapper BUSCO assesses genome/transcriptome assemblies and annotation completeness.

Required input:

Fasta file with assembly.

Required output:

a output directory

Parameters:

mode: Either genome, transcriptome or proteins
lineage: Path to a lineage for busco assessment
options: a list of valid BUSCO options.

Log:

a log file generated by BUSCO.

Example

rule busco:
        input:
                "trinity/trinity.fas"
        output:
                directory("busco")
        params:
                mode = config["busco"]["mode"],
                lineage = config["busco"]["lineage"],
                downloads_path = config["busco"]["downloads_path"],
                options = config["busco"]["options"]
        threads:
                config["busco"]["threads"]
        log:
                "busco/busco.log"
        wrapper:
                "main/wrappers/busco"

Configuration

######################################################################
# BUSCO section
#
# :Parameters:
#
# - mode: Either genome, transcriptome or proteins
# - lineage: Path to a lineage for busco assessment
# - downloads_path: Directory where downloads are stored
# - options: string with any valid BUSCO options
busco:
        mode: 'genome'
        lineage: 'stramenopiles_odb10'
        downloads_path: 'busco/busco_downloads'
        options: ''
        threads: 4

References

https://busco.ezlab.org/busco_userguide.html

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.8. bz2_to_gz

The bz2_to_gz wrapper bz2_to_gz converts fastq.gz files to fastq.bz2 files

Here are the steps followed by the rule. Any failure stops the process and the original file is untouched. If all succeed, the input is deleted.

the input BZ2 file is checked for integrity.
the input BZ2 file is decompressed with pbunzip2 and redirected a pipe to pigz executable into a GZ output.
the output is checked for integrity with pigz.
the input BZ2 file is deleted.

Required input:

bzipped files (wildcards possible)

Required output:

output gzipped files (wildcards possible)

Example

rule bz2_to_gz:
        input:
                "{sample}.fq.bz2"
        output:
                "{sample}/bz2_to_gz/{sample}.fq.gz"
        threads:
                config["bz2_to_gz"]["threads"]
        wrapper:
                "main/wrappers/bz2_to_gz"

Configuration

######################################################################
# bz2_to_gz section
#
# :Parameters:
#
bz2_to_gz:
        threads: 4

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.9. canu

The canu wrapper This wrapper makes long reads correction and/or assembly using Canu.

This wrapper takes fastq or fasta file. Using corrected or trimmed step option, the output is corrected reads in a fasta.gz file. Without step option, the output is the assembly in a fasta file.

Required input:

Long reads file (FASTQ/FASTA format)

Required output:

Assembly result in fasta or corrected reads in fasta.gz.
The canu{step}.done file is a trigger file to let Snakemake knows that canu computation is over.

Required parameters

preset: Any preset in this list: - "pacbio": Raw pacbio data - "pacbio-hifi": Hifi pacbio data - "nanopore": Nanopore data
genome_size: The expected genome size.
step: The step that you want to do in this list: - "-correct": Canu read correction. - "-trim": Canu read trimming. - "": Default Canu assembly.
use_grid: Let Canu handle the usage of your cluster or not.
options: a list of valid Canu options.

Example

rule canu:
    input:
        "nice_pb_long_reads.fastq"
    output:
        "canu/nice_assembly.contigs.fasta",
        "canu/canu.done"
    params:
        preset = "pacbio",
        genome_size = "3G",
        step = "",
        use_grid = True,
        options = ""
    threads: 1
    wrapper:
        "main/wrappers/canu"

Configuration

##############################################################################
# Canu long read assembly
#
# :Parameters:
#
# - preset: Any preset in this list. (pacbio, pacbio-hifi, nanopore)
# - genome_size: An estimate of the size of the genome. Common suffices are allowed, for example, 3.7m or 2.8g.
# - use_grid: let canu run steps on cluster.
# - step: Any step in this list. (-correct, -trim, "")
# - options: any options recognised by canu
# - threads: Number of threads to use
canu:
    preset: 'pacbio'
    genome_size: '3.3m'
    step: ''
    use_grid: true
    options: ''
    threads: 1

References

https://canu.readthedocs.io/en/latest/quick-start.html

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.10. consensus

consensus

docstring for consensus wrapper not yet available (no README.md found)

10.11. digital_normalisation

The digital_normalisation wrapper Digital normalisation is a method to normalise coverage of a sample in fixed, low memory and without any reference. The assembly with normalised data provides results qs good or even better than assembling the unnormalised data. Furthermore, SPAdes with normalised data is notably speeder and cost less memory than without digital normalisation.

Required input:

Fastq files (gzip compressed or not). Please provide a list for paired data, a string otherwise

Required output:

Fastq files uncompressed. Please provide a list for paired data, a string otherwise

Parameters:

Log:

Log with stdout and sterr of Khmer

Example

System Message: WARNING/2 (<string>, line 30)

Literal block expected; none found.

Note that the input and output must be named

:

rule digital_normalisation_PE:

input:

fastq=["data_R1_.fastq.gz", "data_R2_.fastq.gz"]

output:

fastq=["data_R1_.dnpe.fastq", "data_R2_.dnpe.fastq"],

log:

"dn_PE.log"

params:

ksize = 20, cutoff = 20, m = 1000000000, options = '',

System Message: WARNING/2 (<string>, line 46)
Definition list ends without a blank line; unexpected unindent.

threads: 1 wrapper:

System Message: ERROR/3 (<string>, line 48)
Unexpected indentation.

"main/wrappers/digital_normalisation"

Configuration

##############################################################################
# Khmer - Digital Normalisation
#
# :Parameters:
#
# - do: if unchecked, this rule is ignored.
# - ksize: kmer size used to normalised the coverage.
# - cutoff: when the median k-mer coverage level is above this number the read
#       is not kept.
# - max_memory_usage: maximum amount of memory to use for data structure.
# - threads: number of threads to be used.
# - options: any options recognised by normalize-by-median.py.
#
digital_normalisation:
    do: yes
    ksize: 20
    cutoff: 20
    max_memory_usage: 4e9
    threads: 4
    options: ''

References

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

Docutils System Messages

System Message: ERROR/3 (<string>, line 36); backlink

Unknown target name: "data_r1".

System Message: ERROR/3 (<string>, line 36); backlink

Unknown target name: "data_r2".

System Message: ERROR/3 (<string>, line 38); backlink

Unknown target name: "data_r1".

System Message: ERROR/3 (<string>, line 38); backlink

Unknown target name: "data_r2".

10.12. dsrc_to_gz

The dsrc_to_gz wrapper dsrc_to_gz converts fastq.dsrc files to fastq.gz files

Here are the steps followed by the rule. Any failure stops the process and the original file is untouched. If all succeed, the input is deleted.

the input DSRC file is decompressed with dsrc and redirected a pipe to pigz executable into a GZ output.
the output is checked for integrity with pigz.
the input DSRC file is deleted.

Required input:

a dsrc compressed FASTQ file

Required output:

a gz compressed FASTQ file

Example

rule dsrc_to_gz:
        input:
                "{sample}.fq.dsrc"
        output:
                "{sample}/dsrc_to_gz/{sample}.fq.gz"
        params:
                options = config["dsrc_to_gz"]["options"]
        threads:
                config["dsrc_to_gz"]["threads"]
        wrapper:
                "main/wrappers/dsrc_to_gz"

Configuration

######################################################################
# dsrc_to_gz section
#
# :Parameters:
#
dsrc_to_gz:
        threads: 4
        options: "-m2"

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.13. falco

The falco wrapper Falco is an emulation of the popular FastQC software to check large sequencing reads for common problems.

Required input:

one or two FASTQ files (if paired)

Required output:

one output file from falco such as summary.txt

Parameters:

options: a list of valid Falco options
working_directory: place where results will be stored

Log:

a log file generated by Falco.

Example

rule falco:
        input:
                "{sample}_R1.fastq", "{sample}_R2.fastq"
        output:
                done = {sample}/falco/summary.txt"
        params:
                options = config["falco"]["options"],
                working_directory = "samples/{sample}"
        threads:
                config["falco"]["threads"]
        log:
                "falco/falco.log"
        wrapper:
                "main/wrappers/falco"

Configuration

######################################################################
# Falco section
#
# :Parameters:
#
# - options: string with any valid Falco options
falco:
        options: ''
        threads: 4

References

https://github.com/smithlabcode/falco

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.14. fastp

fastp

docstring for fastp wrapper not yet available (no README.md found)

10.15. fastq_stats

The fastq_stats wrapper This wrapper creates images and statistics related to FastQ data

This wrapper takes 1 FastQ file as input.

Required input:

one FastQ file.

Required output:

gc: PNG image of the GC content
json: some stats such as GC content, mean read length, etc
boxplot: a fastqc-like quality image

Required parameters:

max_reads: uses only 500,000 reads

Example

rule fastq_stats:
    input:
        "{sample}.fastq.gz"
    output:
        gc="{sample}/fastq_stats/{sample}_gc.png",
        boxplot="{sample}/fastq_stats/{sample}_boxplot.png",
        json="{sample}/fastq_stats/{sample}.json"
    params:
        max_reads=500000
    wrapper:
       "main/wrappers/fastq_stats"

Configuration

System Message: WARNING/2 (<string>, line 43)

Literal block expected; none found.

Not required

References

https://sequana.readthedocs.io

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.16. fastqc

The fastqc wrapper This wrapper calls FastQC on input data sets (paired or not)

This wrapper takes 1 or 2 FastQ files as input. It creates FastQC reports (HTML and zip file). A fastqc.done file is generated once done.

Required input:

one FastQ file or two FastQ files (if paired data). Could also be a BAM file.

Required output:

done: a filename used as a trigger of job done. Be aware that the root directory of this file is used to store the results.

Required parameters:

options: a list of valid FastQC options
working_directory: place where results of FastQC are saved

Log:

a log file generated by FastQC is stored

Example

rule fastqc:
    input:
        "{sample}_R1_.fastq", "{sample}_R2_.fastq"
    output:
        done = "{sample}/fastqc/fastqc.done"
    params:
        options = config['fastqc'][options'],
        working_directory = "{sample}/fastqc"
    threads:
        config['fastqc']["threads"]
    log:
        "{sample}/fastqc/fastqc.log"
    wrapper:
        "main/wrappers/fastqc"

Configuration

##############################################################################
# FastQC section
#
# :Parameters:
#
# - options: string with any valid FastQC options
fastqc:
    options: ''
    threads: 4

References

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.17. feature_counts

The feature_counts wrapper This wrapper counts reads mapped to genomic regions using featureCounts (subread)

Required input:

bam:the input BAM file
gff: the GFF annotation file

Required output:

counts: the counts file (tabulated file)
summary: the summary file (tabulated file)

Required Parameters:

feature: a valid feature to be found in the GFF file
attribute: a valid attribute to be found in the GFF file
strandness: a single integer value, 0 (unstranded), 1 (stranded) or 2 (reversely stranded).
options: any other parameters accepted by featureCounts

Log:

a log file generated by FeatureCounts

Example

rule feature_counts:
    input:
        bam="{sample}/bamfile/{sample}.sorted.bam,
        gff="genome.gff"
    output:
        counts="{sample}/feature_counts/{sample}_feature.out",
        summary="{sample}/feature_counts/{sample}_feature.out.summary"
    params:
        options=config["feature_counts"]["options"],
        feature=config["feature_counts"]["feature"],
        attribute=config["feature_counts"]["attribute"],
        strandness=config["feature_counts"]["strandness"]
    threads:
        config["feature_counts"]['threads']
    log:
        "{sample}/feature_counts/feature_counts.log"
    wrapper:
        "main/wrappers/feature_counts"

Configuration

    ######################################################################
    # FeatureCounts section
    #
    # :Parameters:
    #
    # - options: string with any valid FeatureCounts options
feature_counts:
    options: ''
    feature: gene    # could be exon, mRNA, etc
    attribute: ID    # could be ID, gene_id, etc
    strandness: 0    # set to 0,1 or 2
            threads: 4

References

http://bioinf.wehi.edu.au/featureCounts/

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.18. freebayes

The freebayes wrapper Freebayes is a variant caller designed to find SNPs and short INDELs from a BAM file. It produces a very well-annotated VCF output. Moreover, it provides a quality score calculated by a bayesian model.

Required input:

bam: Sorted BAM file.
ref: FASTA file of the reference genome.

Required output:

vcf: VCF file of detected variants.

Required parameters:

options: a list of valid freebayes options
ploidy: sets the default ploidy for the analysis

Log:

a log file generated by freebayes

Example

rule freebayes:
    input:
        bam = "{sample}/bamfile/{sample}.sorted.bam",
        ref = "measles.fa"
    output:
        vcf = "{sample}/freebayes/{sample}.vcf"
    log:
        "{sample}/freebayes/freebayes.log"
    params:
        ploidy = config["freebayes"]["ploidy"],
        options = config["freebayes"]["options"]
    wrappers"
        "main/wrappers/freebayes/

Configuration

    ######################################################################
    # Freebayes section
    #
    # :Parameters:
    #
    # - ploidy: sets the default ploidy for the analysis
    # - options: string with any valid Freebayes options
freebayes:
            ploidy: 1
            options: "--legacy-gls"

References

https://github.com/freebayes/freebayes

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.19. freebayes_vcf_filter

The freebayes_vcf_filter wrapper Variant filter rules dedicated to VCF files generated by freebayes. It filters with freebayes quality score, coverage depth, frequency and strand ratio.

Required input:

vcf: VCF file from freebayes.

Required output:

vcf: Filtered VCF file.
csv: CSV file of filtered variants.
html: HTML report. Note that the output is always called "variant_calling.html" If you want to save the file in a specific directory, use report_dir parameter

Required parameters:

report_dir: Report directory to copy JS/CSS.
filter_dict: A dictionnary of filtering parameters (freebayes score, frequency

System Message: WARNING/2 (<string>, line 21)

Bullet list ends without a blank line; unexpected unindent.

minimum depth, forward and reverse depth, and strand ratio).

Example

rule freebayes_vcf_filter:
    input:
        vcf = "{sample}/freebayes/{sample}.vcf",
    output:
        vcf = "{sample}/freebayes_vcf_filter/{sample}.filter.vcf",
        csv = "{sample}/freebayes_vcf_filter/{sample}.filter.csv",
        html = "{sample}/freebayes_vcf_filter/report/{sample}_variant_calling.html"
    params:
       filter_dict= {
           "freebayes_score": config["freebayes_vcf_filter"]["freebayes_score"],
           "frequency": config["freebayes_vcf_filter"]["frequency"],
           "min_depth": config["freebayes_vcf_filter"]["min_depth"],
           "forward_depth": config["freebayes_vcf_filter"]["forward_depth"],
           "reverse_depth": config["freebayes_vcf_filter"]["reverse_depth"],
           "strand_ratio": config["freebayes_vcf_filter"]["strand_ration"],
       }
       report_dir="report"
   wrapper:
       "main/wrappers/freebayes_vcf_filter"

Configuration

    ######################################################################
    # Freebayes vcf filter section
    #
    # :Parameters:
    #
freebayes_vcf_filter:
    freebayes_score: 20
    frequency: 0.7
    min_depth: 10
    forward_depth: 3
    reverse_depth: 3
    strand_ratio: 0.2

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.20. gz_to_bz2

The gz_to_bz2 wrapper gz_to_bz2 Converts fastq.gz files to fastq.bz2 files.

Here are the steps followed by the rule. Any failure stops the process and the original file is untouched. If all succeed, the input is deleted.

the input GZ file is checked for integrity.
the input GZ file is decompressed with pigz and redirected a pipe to pbzip2 executable into a BZ2 output.
the output is checked for integrity with pbzip2.
the input GZ file is deleted.

Required input:

A FASTQ gzip compressed file

Required output:

A FASTQ bz2 compressed file

Example

rule gz_to_bz2:
        input:
                "{sample}.fq.gz"
        output:
                "{sample}/gz_to_bz2/{sample}.fq.bz"
        threads:
                config["gz_to_bz2"]["threads"]
        wrapper:
                "main/wrappers/gz_to_bz2"

Configuration

######################################################################
# gz_to_bz2
#
gz_to_bz2:
        threads: 4

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.21. hmmbuild

hmmbuild

docstring for hmmbuild wrapper not yet available (no README.md found)

10.22. hmmscan

hmmscan

docstring for hmmscan wrapper not yet available (no README.md found)

10.23. index

index

docstring for index wrapper not yet available (no README.md found)

10.24. longorfs

longorfs

docstring for longorfs wrapper not yet available (no README.md found)

10.25. macs3

The macs3 wrapper This wrapper calls peaks in ChIP-seq and ATAC-seq like sequencing runs.

Required input:

1 or several sorted BAM files (inputs)
1 or several sorted BAM files (controls)

You must set the names IP and control in the input list (see example below).

Required output:

A set of files, created by macs3

The output files are processed to extract the prefix to be used by macs3. Their prefixes must be the same. You may provide only one output directory or list the different output files created by macs3.

Log:

a log file with stdout/stderr

Optional parameters:

qvalue is optional and defaults to 0.05
options is optional and defaults to '--keep-dup all'

Required parameters:

bandwidth: 300 # default bandwidth option --bw from macs3
broad_cutoff: 0.05 is used and required only if the mode is set to 'broad'
genome_size: this is the mappable size of your genome and is a very important parameter
mode: 'broad' # broad or narrow. compulsary parameter
paired: True is the data paired or not. Using sequana manager, you can simply set it to manager.paired
prefix: tag name. All files will be saved in an output directory according to your output files. However, you also need to provide a prefix to all the files that will be generated. Set to e.g. 'macs3' or the name of a comparison.

Example

System Message: WARNING/2 (<string>, line 45)

Literal block expected; none found.

For a set of inputs, use a list if required. The outputs must contain the world "narrow" or "broad". Based on this word, macs will be called in narrow and/or broad mode.

rule macs3:

input:

inputs = ["1.bam", "2.bam"], controls = "3.bam",

output:

"macs3/narrow/tag_peaks.xls",

params:

bandwidth = 300, broad_cutoff = 0.05, genome_size = 4000000, mode = "narrow", options = " --keep-dup all ", paired = True, prefix = "tag", qvalue = 0.05

log:

out.log

wrapper:

"main/wrappers/macs3"

Configuration

############################################################################################
# macs3 peak caller
#
# bandwidth: 300           # default bandwidth option --bw from macs3
# broad_cutoff: 0.05
# genome_size: 4000000  # the mappable size of your genome.
# mode: 'broad'    # broad or narrow. compulsary
# options: --keep-dup all      ## we keep all duplicates and let macs3 do the job
# paired: True
# prefix: tag name
# qvalue: 0.05

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.26. makeblastdb

makeblastdb

docstring for makeblastdb wrapper not yet available (no README.md found)

10.27. mark_duplicates

The mark_duplicates wrapper This wrapper marks duplicates using picard tools so that tools such as variant caller are aware of duplicated reads

Required input:

BAM file

Required output:

The first output must be the expected output BAM file with the marked duplicated reads.
The second output contains metrics

Log:

a log file with stdout/stderr

Required parameters:

remove_dup if set to true will remove the duplicated reads (default is False)
tmpdir is used to stored temporary files

Example

rule mark_duplicates:
    input:
        "test.bam"
    output:
        bam = "test/md.bam",
        metrics = "test/md.metrics"
    log:
        out = "test/log.out",
        err = "test/log.err"
    params:
        remove_dup = "false",
        tmpdir = "test/tmp"
    wrapper:
        "main/wrappers/mark_duplicates"

Configuration

#############################################################################
# mark_duplicates (picard-tools) allows to mark PCR duplicate in BAM files
#
# :Parameters:
#
# - do: if unchecked, this rule is ignored. Mandatory for RNA-SeQC tool.
# - remove: If true do not write duplicates to the output file instead of writing them with
#            appropriate flags set.  Default value: false. This option can be set to 'null' to clear
#            the default value. Possible values: {true, false}
# - tmpdir: write tempory file on this directory (default TMP_DIR=/tmp/, but could be "TMP_DIR=/local/scratch/")
#
mark_duplicates:
    do: false
    remove: false ## may be True
    tmpdir: ./tmp/

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.28. minimap2

The minimap2 wrapper This wrapper maps sequencing reads onto a reference using Minimap2

This wrapper takes single or paired data and map them on a reference. The results file is a sorted and BAM file (not indexed).

Required input:

fastq: the input FastQ files (single or paired)
reference: the input genome used as a reference. No need for indexing

Required output::

sorted: the sorted output BAM file

Required parameters:

options: a list of valid fastqc options

Log:

a log file generated by minimap2 is stored

Example

rule minimap2:
    input:
        fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
        reference="genome.fasta"
    output:
        sorted="{sample}/{sample}.sorted.bam"
    log:
        "{sample}/{sample}.log"
    params:
        options=config['minimap2']['options']
    threads:
        config['minimap2']['threads']
    wrapper:
        "main/wrappers/minimap2"

Configuration

#############################################################################
#  Bowtie2 read mapping
#
# :Parameters:
#
# - options: any options recognised by minimap2
# - threads: number of threads to be used
minimap2:
    options: ''
    threads: 4

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.29. multiqc

The multiqc wrapper This wrapper aggregates results from various bioinformatics analyses across many samples

Required input:

list of files expected by MultiQC

Required output:

done: a filename used as a trigger of job done. Be aware that the root directory of this file is used to store the results.

Required parameters:

options: a list of valid MultiQC options
working_directory: place where results of MultiQC are saved

Log:

a log file generated by MultiQC is stored

Example

rule multiqc:
    input:
        ["file1", "file2"]
    output:
        "multiqc/multiqc_report.html"
    params:
        options=config['multiqc']['options'],
        input_directory=config['multiqc']['input_directory'],
        config_file=config['multiqc']['config_file'],
        modules=config['multiqc']['modules']
    log:
        "multiqc/multiqc.log"
    wrapper:
       "main/wrappers/multiqc"

Configuration

##############################################################################
# MultiQC section
#
# :Parameters:
#
# - options: string with any valid MultiQC options
multiqc:
    options: ''
    threads: 4

References

https://multiqc.info/

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.30. polypolish

The polypolish wrapper This wrapper calls polypolish

This wrapper takes a draft assembly as input as well as a SAM file. The SAM file should be the alignment of Illumina data onto the assembly, as recommended by the polypolish author.

Required input:

alignment: A SAM file.
assembly: A FastA file, the draft assembly.

Required output:

The corrected assembly in FastA format.

Optional parameters:

options: a list of valid polypolish options (default to nothing)

Log:

a log file generated by Polypolish is stored

Example

System Message: WARNING/2 (<string>, line 31)

Literal block expected; none found.

The params section is optional. The input can be named or not. If not, alignment must be first and assembly must follow. The output can be named or not.

rule polypolish:

input:

alignment="alignments.sam", assembly="assembly.fasta"

output:

fasta = "polypolish/polypolish.fasta"

params:

options= " ",

log:

"polypolish/test.log"

wrapper:

"main/wrappers/polypolish"

Configuration

##############################################################################
# Polypolish section
#
# :Parameters:
#
# - options: string with any valid FastQC options
polypolish:
    options: ''

References

https://github.com/rrwick/Polypolish/

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.31. predict

predict

docstring for predict wrapper not yet available (no README.md found)

10.32. prokka

The prokka wrapper This wrapper annotates assembly with Prokka.

Required input:

assembly: the input assembly fasta file.

Required output::

output: any file you need (.gbk/.gff/.fna).

Required parameters:

options: a list of valid prokka options.

Log:

a log file generated by prokka is created.

Example

rule prokka:
    input: "assembly/contigs.fasta"
    output: "prokka/contigs.gff"
    params:
        options=config["prokka"]["options"]
    log:
        "logs/prokka.log"
    threads:
        config["prokka"]["threads"]
    wrapper:
        "main/wrappers/prokka"

Configuration

##############################################################################
# Prokka
#
# :Parameters:
#
# - options: any options recognised by prokka cli.
# - threads: number of threads to be used.
#
prokka:
    options:
    threads: 4

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.33. quast

The quast wrapper This wrapper assesses an assembly using Quast.

Required input:

fastq: the input fastq files.
assembly: the input assembly fasta file.

Required output::

output: Any empty filename. The path defines the quast output directory.

Required parameters:

preset: preset for application (single, pacbio, nanopore). Ignored if fastq are paired-end.
reference: reference genome fasta file.
annotation: file with genomic feature coordinates in the reference (GFF, BED, NCBI or TXT).
options: a list of valid Quast options.

Log:

a log file generated by quast is created.

Example

rule quast:
    input:
        fastq="data/raw.fastq.gz",
        assembly="assembly/contigs.fasta"
    output: "quast/quast.done"
    params:
        preset=config["quast"]["preset"],
        reference=config["quast"]["reference"],
        annotation=config["quast"]["annotation"],
        options=config["quast"]["options"],
    log:
        "logs/quast.log"
    threads:
        config["quast"]["threads"]
    wrapper:
        "main/wrappers/quast"

Configuration

##############################################################################
# Quast
#
# :Parameters:
#
# preset: preset for application (single, pacbio, nanopore). Ignored if fastq are paired-end.
# reference: reference genome fasta file.
# annotation: file with genomic feature coordinates in the reference (GFF, BED, NCBI or TXT).
# options: a list of valid quast.py options.
#
quast:
    preset: pacbio
    reference: path_to/knowned_ref.fa
    annotation: path_to/knowed_ref.gff
    options: "--fungus"
    threads: 4

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.34. rulegraph

The rulegraph wrapper This wrapper creates a rulegraph showing your pipeline dependencies

Required input:

the snakefile filename

Required ouput:

the output dot filename

Required parameters:

mapper: a dictionary mapping each rule to a URL (HTML file or directory). Rules provided in this dictionary will be shown in blue and clickable in the ouptut SVG file.
configname: a config file required by the input Snakefile
required_local_files: a list of required files and directories next to the Snakefile that are required to run the pipeline.

Example

rulegraph_params_mapper = {"a_given_rule": "its_html.html"}

rule rulegraph:
    input:
        manager.snakefile
    output:
        "rulegraph/rulegraph.dot"
    params:
        configname = "config.yaml",
        mapper = rulegraph_params_mapper,
        required_local_files = ['rules/',]
    container:
      "https://zenodo.org/record/7928262/files/graphviz_7.0.5.img"
    wrapper:
        "main/wrappers/rulegraph"

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.35. sambamba_filter

The sambamba_filter wrapper This rule uses sambamba view to filter reads with a mapping quality lower than a threshold. It also removes reads with multiple occurrence.

Required input:

BAM file

Required output:

the filtered output BAM file.

Parameters:

threshold: Quality threshold used for filtering
options: a list of valid sambamba filter options

Log:

two log files named .out and .err

Example

rule sambamba_filter:
    input:
        "{sample}/bamfile/{sample}.sorted.bam"
    output:
        "{sample}/sambamba_filter/{sample}.sorted.bam"
    log:
        out = "{sample}/sambamba_filter/log.out",
        err = "{sample}/sambamba_filter/log.err"
    params:
        threshold = config["sambamba_filter"]["threshold"],
        options = config["sambamba_filter"]["options"]
    wrapper:
        "main/wrappers/sambamba_filter"

Configuration

    ######################################################################
    # Sambamba filter section
    #
    # :Parameters:
    #
    # - options: string with any valid Sambamba filter options
    # - threshold: quality threshold used for filtering
sambamba_filter:
    threshold: 30 # Mapping quality score threshold
    options: ''

References

https://github.com/biod/sambamba

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.36. sambamba_markdup

The sambamba_markdup wrapper This rule marks or removes PCR duplicate reads. For determining whether a read is a duplicate or not, the same criteria as in Picard are used.

Required input:

BAM file

Required output:

bam:The first output must be the expected output BAM file marked duplicates.
bai: (optional) you may specify the BAM index file, which will be generated by the wrapper.

Parameters:

remove_duplicates: Remove or just mark duplicate reads
tmp_directory: Temporary directory
options: a list of valid sambamba markdup options

Log:

a log file generated by sambamba is created.

Example

rule sambamba_markdup:
    input:
        "{sample}/sambamba_filter/{sample}.sorted.bam"
    output:
        bam="{sample}/sambamba_markdup/{sample}.sorted.bam",
        bai="{sample}/sambamba_markdup/{sample}.sorted.bam.bai"
    log: "{sample}/sambamba_markdup/log.out",
    params:
        remove_duplicates = config["sambamba_markdup"]["remove_duplicates"],
        tmp = config["sambamba_markdup"]["tmp_directory"],
        options = config["sambamba_markdup"]["options"]
    wrapper:
        "main/wrappers/sambamba_markdup"

Configuration

    ######################################################################
    # Sambamba markdup section
    #
    # :Parameters:
    #
    # - remove_duplicates: Remove or just mark duplicate reads
    # - tmp_directory: Temporary directory
    # - options: string with any valid Sambamba markdup options
sambamba_markdup:
    remove_duplicates: False
    tmp_directory: /tmp
    options: ''

References

https://github.com/biod/sambamba

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.37. samtools_depth

The samtools_depth wrapper Samtools Depth creates a BED file with the coverage depth for each base position. It can also compute multiple BAM files and concatenate results in one BED file.

Required input:

A Sorted BAM file or list of bam file.

Required output:

A BED file with coverage for each base.

Required parameters:

options: a list of valid Samtools Depth options

Log:

The redirected standard error of Samtools Depth

Example

rule samtools_depth:
    input:
        "{sample}/bamfile/{sample}.sorted.bam"
    output:
        "{sample}/samtools_depth/{sample}.bed"
    log:
        "{sample}/samtools_depth/samtools_depth.log"
    params:
        options = config['samtools_depth']['options']
    wrapper:
        "main/wrappers/samtools_depth"

Configuration

    ######################################################################
    # Samtools Depth section
    #
    # :Parameters:
    #
    # - options: string with any valid Samtools Depth options
samtools_depth:
    options: ''

References

https://www.htslib.org/doc/samtools-depth.html

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.38. sequana_coverage

The sequana_coverage wrapper Sequana coverage detects and characterises automatically low and high genome coverage regions. It provides a useful HTML report with dynamic plot and table. Moreover, a CSV file with all metrics computed is created. This CSV can be used to regenerate the sequana_coverage report.

Required input:

bed: a BED file (built e.g with samtools_depth -aa input.bam)
fasta: a FASTA file of the reference.
gbk: a GENBANK file. (Optional)

Required output:

FASTA file with locus names.

Log:

a log file

Example

rule sequana_coverage:
    input:
        bed = "test.bed",
        fasta = "measles.fa",
        gbk = "measles.gbk"
    output:
        html = "test/sequana_coverage.html"
    params:
        mixture_models = 2,
        window_size = 3001,
        double_threshold = 0.5,
        circular = True,
        chunksize = 5000000,
        gc_window_size = 201,
        mixture_models = 2
    wrapper:
        "main/wrappers/sequana_coverage"

Configuration

##############################################################################
#
# :Parameters:
#
# :param circular: is your genome circular or not ?
# :param chunksize: for large genomes, split the data into chunks
# :param double_threshold: double threshold for clustering. Keep 0.5 if you do
#     not know. Otherwise, checkout the online documentation on
#     sequana.readthedocs.io
# :param high_threshold: keep 4 or check the online documentation
# :param low_threshold: keep -4 or check the online documentation
# :param mixture_models: keep to 2.
# :param window: the W parameter of the running median. Keep as long as twice
#     the deleted/depleted/duplicated you want to identify or to avoid. short
#     genome will be set to genome length divided by 5 automatically.
sequana_coverage:
    do: true
    circular: true
    chunksize: 6000000
    double_threshold: 0.5
    gc_window_size: 201
    high_threshold: 4.0
    low_threshold: -4.0
    mixture_models: 2
    window_size: 3001

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.39. sequana_taxonomy

The sequana_taxonomy wrapper Sequana taxonomy performs taxonomic analysis using Kraken2. It is essentially a wrapper of kraken that hides technical details and a single standalone to cope with paired/unpaird data, multiple kraken databases, conversion of taxons, etc. It also provides a HTML report

Required input:

fastq: a BED file (built e.g with samtools_depth -aa input.bam)

Required output:

The output of sequana_taxonomy is multiple. In the output directory, you will find a summary.html, and a sub directory called kraken with a CSV file called kraken.csv, a summary file called kraken.out.summary a json summary file called summary.jsona and finally unclassified fastq files. The option store_unclassified is currently required. If set to False, empty files are created.

Example

rule sequana_taxonomy:
    input:
        "test_R1_.fastq"  # second file for paired data is possible
    output:
        html         = '{sample}/summary.html',
        csv          = '{sample}/kraken/kraken.csv',
        summary      = '{sample}/kraken/kraken.out.summary',
        summary_json = '{sample}/kraken/summary.json',
        unclassified = '{sample}/kraken/unclassified.fastq'
    threads:
        4
    params:
        # required
        paired=True,
        databases=['toydb'],  # a list of valid kraken databases
        store_unclassified=True,
        # optional
        confidence=0,
        level="INFO",
        options=""
    container:
        https://zenodo.org/record/7963917/files/sequana_tools_0.15.1.img
    wrapper:
        "main/wrappers/sequana_taxonomy"

Configuration

##############################################################################
#
# :Parameters:
#
#
# * paired indicates whether input fastq is paired or not.
# * databases: a list of valid databases
# * store_unclassified
#
# optional ones:  confidence (default to 0), level (default to INFO) and
#                 options (default to "")
sequana_taxonomy:
    databases:
        - /home/user/.config/sequana/kraken2_dbs/viruses_masking
    level: INFO
    confidence: 0.05
    store_unclassified: false
    options: ''
    threads: 4

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.40. snpeff

The snpeff wrapper SnpEff adds annotation of variants detected in a VCF file. It annotates using the old 'EFF' field instead of 'ANN' field. The latter does not provide the codon change information.

Required input:

vcf: VCF file of detected variants.
ann: Annotation genbank file

Required output:

vcf: Annotated VCF file.
html: HTML report
csv: CSV file with Variants

Example

rule snpeff:
    input:
        vcf = "{sample}/freebayes/{sample}.vcf",
        ann = "measles.gbk"
    output:
        vcf = "{sample}/snpeff/{sample}.vcf",
        csv = "{sample}/snpeff/{sample}.csv",
        html = "{sample}/snpeff/{sample}.html"
    log:
        "{sample}/snpeff/{sample}.log"
    params:
        options = config["snpeff"]["options"]
    wrapper:
        "main/wrappers/snpeff"

Configuration

######################################################################
# SNPEff section
#
# :Parameters:
#
# - options: string with any valid SNPEff options
snpeff:
        options: '-no-downstream'

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.41. snpeff_add_locus_in_fasta

The snpeff_add_locus_in_fasta wrapper SnpEff requires the locus names in the annotation file and in the FASTA file (contig name) to be identical. To make this is true, this rule adds locus names of the genbank file into the FASTA file before the mapping.

Required input:

fasta FASTA file of the reference.
ann GENBANK or GFF file for annotation

Required output:

FASTA file with locus names.

Log:

a log file

Example

rule snpeff_add_locus_in_fasta:
    input:
        fasta="{sample}.fas",
        ann="{sample}.gbk"
    output:
        "{sample}/snpeff_add_locus_in_fasta/{sample}.fas
    log:
        "{sample}/snpeff_add_locus_in_fasta/{sample}.log"
    container:
        "https://zenodo.org/record/7963917/files/sequana_tools_0.15.1.img"
    wrapper:
        "main/wrappers/snpeff_add_locus_in_fasta

Configuration

System Message: WARNING/2 (<string>, line 42)

Literal block expected; none found.

There is no configuration required for this wrapper.

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.42. sort

sort

docstring for sort wrapper not yet available (no README.md found)

10.43. spades

The spades wrapper This wrapper generates de novo assembly using SPAdes.

Required input:

fastq: the input fastq files.

Required output::

contigs: the contigs fasta file.
scaffolds: the scaffolds fasta file.

Required parameters:

k: list of k-mer sizes (must be odd and less than 128). The default is 'auto' mode.
preset: any preset in this list ["meta", "sc", "isolate", "metaplasmid", "metaviral", "rna", "rnaviral"]. Empty by default.
options: a list of valid SPAdes options. Some options are incompatible with some preset.
memory: RAM limit for SPAdes in Gb. Make sure this value is correct for the given machine. SPAdes uses the limit value to automatically determine the sizes of various buffers, etc. The wrapper default is 32 Gb.

Notes: This wrapper cannot be used to perform correction only.

Log:

a log file generated by SPAdes is created.

Example

rule spades:
    input:
        fastq="data/raw.fastq.gz",
    output:
        contigs="assembled/contigs.fasta",
        scaffolds="assembled/scaffolds.fasta"
    params:
        k=config["spades"]["k"],
        preset=config["spades"]["preset"],
        options=config["spades"]["options"],
        memory=config["spades"]["memory"]
    log:
        "logs/spades.log"
    threads:
        config["spades"]["threads"]
    wrapper:
        "main/wrappers/spades"

Configuration

##############################################################################
# SPAdes assembly
#
# :Parameters:
#
# - k: A kmer or list of kmer used to assemble the genome.
# - preset: any preset in this list ["meta", "sc", "isolate", "metaplasmid", "metaviral", "rna", "rnaviral"]
# - options: any options recognised by spades.py cli. (do not use --only-error-correction)
# - threads: number of threads to be used.
# - memory: memory limit in Gb.
#
spades:
    k: ""
    preset: ""
    options: "--careful"
    threads: 4
    memory: 64

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.44. trinity

The trinity wrapper De novo transcriptome assembly from Illumina RNA-seq data using Trinity.

Required input:

Path to fastq or fasta files (paired or single)

Required output:

Path to output assembly fasta file

Parameters:

options: a list of valid Trinity options.

Ressources:

mem_gb or mem_mb: set the maximum memory usage.

Log:

a log file generated by Trinity.

Example

## Paired reads

rule trinity:

input:

left = expand("{sample}_R1.fq.gz", sample=samples), right = expand("{sample}_R2.fq.gz", sample=samples)

output:

"trinity/trinity.fas"

params:

options = config["trinity"]["options"],

threads:

config["trinity"]["threads"]

resources:

mem_gb = config["trinity"]["mem_gb"]

log:

"trinity/trinity.log"

wrapper:

"main/wrappers/trinity"

## Single reads

rule trinity:

input:

left = expand("{sample}.fq.gz", sample=samples)

output:

"trinity/trinity.fas"

System Message: WARNING/2 (<string>, line 55)
Definition list ends without a blank line; unexpected unindent.

[...]

Configuration

######################################################################
# Trinity section
#
# :Parameters:
#
# - options: string with any valid Trinity options
trinity:
        options: ''
        threads: 4
        mem_gb: 10

References

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.45. trinity_quantify

The trinity_quantify wrapper De novo transcriptome quantification from Illumina RNA-seq data using Trinity.

Required input:

fasta: Path to the transcriptome assembly from Trinity.
left: List of paths to fastq files for R1.
right: (optional) List of paths to fastq files for R2.

Required output:

outdir: output directory.
transcripts: Path to abundance file for transcripts.
genes: Path to abundance file for genes.

Parameters:

lib_type: Type of strand-specific library (RF, FR, F or R).
est_method: The method to use for quantification: kallisto (default), salmon or RSEM.

Log:

a log file generated by Trinity.

Example

rule quantify:
        input:
                fasta="trinity_out_dir/Trinity.fasta",
                left="{sample}/fastp/{sample}_R1_trimmed.fastq",
                right="{sample}/fastp/{sample}_R2_trimmed.fastq",
        output:
                outdir=directory("{sample}/kallisto"),
                transcripts="{sample}/kallisto/abundance.tsv",
                genes="{sample}/kallisto/abundance.tsv.genes",
        params:
                lib_type="RF",
                est_method="salmon"
        log:
                "logs/trinity/{sample}_quantify.log"
        threads: 2
        wrapper:
                "main/wrappers/trinity_quantify"

Configuration

######################################################################
# Trinity quantification section
#
# :Parameters:
#
# - lib_type: string strand-specificity library type
# - est_method: string quantification method
trinity_quantify:
        lib_type: 'RF'
        est_method: 'salmon'
        threads: 4

References

https://github.com/trinityrnaseq/trinityrnaseq/wiki

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues

10.46. unicycler

The unicycler wrapper This wrapper generates de novo assembly using Unicycler an assembly pipeline for bacterial genomes.

Required input:

fastq: the input fastq files. Could be short (paired or not) or long reads

Required output::

contigs: the contigs fasta file.

Required parameters:

mode: Bridging mode. (conservative|normal|bold) (default: "normal")
options: a list of valid Unicycler options.

Optional parameters:

long_reads if provided and set to True, unicycler is used for long reads. expected input is therefore single end data (one file).

Notes: This wrapper cannot be used to perform correction only.

Log:

a log file generated by Unicycler is created.

Example

rule unicycler:
    input:
        fastq="data/raw.fastq.gz",
    output:
        contigs="assembled/contigs.fasta",
    params:
        mode=config["unicycler"]["mode"],
        options=config["unicycler"]["options"],
    log:
        "logs/unicycler.log"
    threads:
        config["unicycler"]["threads"]
    container:
    wrapper:
        "main/wrappers/unicycler"

Configuration

##############################################################################
# Unicycler assembly
#
# :Parameters:
#
# - mode: any bridging mode in this list ["conservative", "normal", "bold"]
# - options: any options recognised by unicycler cli.
# - threads: number of threads to be used.
# - long_reads: set to True to switch to long read analysis
#
unicycler:
    mode: "normal"
    long_reads: True # optional
    options: ""
    threads: 4

Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues