10. Wrappers
As of August 2021, Sequana team created the e sequana wrappers repository, which is intended to replace the rules. The adavantage is that wrappers can be tested with a continuous integration.
Wrappers are used within a Snakemake rule. When you call your Snakemake pipeline, you will need to add:
--wrapper-prefix git+file:https://github.com/sequana/sequana-wrappers/
We provide documentation for each wrapper. It can be included in this documentation thanks to a sphinx extension. For example:
.. sequana_wrapper:: fastqc
Here is a non exhaustive list of documented wrappers.
10.1. bowtie2/align
The bowtie2/align wrapper Maps sequencing reads onto a reference using Bowtie2
This wrapper takes single or paired data and map them on a reference. The results file is a sorted and indexed BAM file.
Required input:
- fastq: the input FastQ files (single or paired)
Required output::
- bam :the output BAM file.
- sorted :the sorted output BAM file. (optional but is computed)
Optional output (recommended):
- sorted If set, the BAM file will also be sorted and indexing provided
Required parameters:
- options: a list of valid fastqc options
- index the expected index prefix of the genome reference. For instance if the reference is in ref/hg38.fa, most probably this parameter must be set to ref/hg38
Log:
- a log file generated by bowtie2 is stored
Example
rule bowtie2: input: fastq=input_data output: bam="{sample}/bowtie2/{sample}.bam sorted="{sample}/bowtie2/{sample}.sorted.bam params: options = config['bowtie2'][options'], index="reference/mygenome" threads: config['bowtie2"]["threads"] log: "{sample}/bowtie2/{sample}.log" wrapper: "main/wrappers/bowtie2/align"
Configuration
############################################################################# # Bowtie2 read mapping # # :Parameters: # # - options: any options recognised by 'bowtie2 index' command # - threads: number of threads to be used bowtie2: options: '' threads: 4
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.2. bowtie2/build
The bowtie2/build wrapper builcs a genome index with Bowtie2
This wrapper takes a reference and build its index. The output files are stored in a directory extracted from the genome reference name.
Required input:
- reference: the reference genome to index (FASTA format)
Required output:
- a list of expected output files. Need at least one. From the first file, the prefix is extracted and used by bowtie2 as indexbase.
Required parameters
- options: a list of valid fastqc options
Log:
- a log file generated by bowtie2 is stored
Example
rule bowtie2_build: input: reference="genome.fa" output: "genome/bowtie2/genome.1.bt2" params: options = config['bowtie2_build'][options'], threads: config['bowtie2_build']["threads"] log: "logs/bowtie2_build/bowtie1_build.log" wrapper: "main/wrappers/bowtie2/build"
Configuration
############################################################################# # bowtie2 build index # # :Parameters: # # - options: any options recognised by 'bowtie2-build' command # - threads: bowtie2_build: options: '' threads: 4
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.3. add_read_group
The add_read_group wrapper This wrapper adds a read group in an input BAM file. several standard read tags are handled by this wrapper but any other can be added using the options parameter. See the note below for details.
Required input:
- BAM file
Required output:
- The first output must be the expected output BAM file with the read group added.
- (optional): although not required, you may specify the indexed BAM file in the output as a second argument, which is generated by the wrapper. The index is built with samtools.
Log:
- a log file with stdout/stderr
Notes:
If not provided, PL is set to 'Illumina', PU is set to 'unknown' and LB is set to 'unknown'. If not provided, SM is set to the input BAM file filename (without .bam extension). You can also use wildcards.
Finally, if any of those options are provided in the options field, they will replace the other fields.
Example
rule add_read_group: input: "test.bam" output: bam="test.rg.bam", bai="test.rg.bam.bai" log: "log.out" params: ID="1", LB="lib", PL="ILLUMINA", PU="unit", SM="test", options="" wrapper: "main/wrappers/add_read_group"
A simpler example:
- rule add_read_group:
- input:
- "{sample}.bam"
- output:
- bam="{sample}.rg.bam",
- log:
- "log.out"
- params:
- SM="{sample}
- wrapper:
- "main/wrappers/add_read_group"
Configuration
add_read_group: options: # result filters options # PL: (automatically filled with Illumina) # PU: (automatically filled with unknown) # LB: (automatically filled with unknown) # SM: (automatically filled with sample name from input file. e.g. file.sorted.bam returns 'file') # ID (automatically set to a unique uuid)
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.4. bam_coverage
bam_coverage
docstring for bam_coverage wrapper not yet available (no README.md found)
10.5. bcl2fastq
The bcl2fastq wrapper This wrapper calls bcl2fastq software to convert BCL raw data into FastQ files
Required input:
- SampleSheet as provided by sequencer providers
Required output:
- Stats/Stats.json
Required parameters:
- options: a list of valid bcl2fastq options
- indir: place where to find BCL directory
- ignore_missing_bcls: interpret missing BCL files as no call (N)
- no-bgzf-compression: turn off BGZF compression for FASTQ files
- barcode-mismatches: number of allowed mismatches per index
- merge_all_lanes: if false, use the --no-lane-splitting option
Log:
- a log file generated by bcl2fastq is stored
Example
rule bcl2fastq: input: samplesheet=config["bcl2fastq"]["SampleSheet.csv"] output: "Stats/Stats.json", params: indir="bcl", barcode_mismatch=config['bcl2fastq']['barcode_mismatch'], ignore_missing_bcls=config['bcl2fastq']['ignore_missing_bcls'], no_bgzf_compression=config['bcl2fastq']['no_bgzf_compression'], merge_all_lanes=config['bcl2fastq']['merge_all_lanes'], options=config['bcl2fastq']['options'] threads: config['bcl2fastq']["threads"] wrapper: "main/wrappers/bcl2fastq"
Configuration
############################################################################## # bcl2fastq section # # :Parameters: # # - options: string with any valid FastQC options bcl2fastq: threads: 4 barcode_mismatch: 0 samplesheet_file: "SampleSheet.csv" ignore_missing_bcls: true no_bgzf_compression: true options: '' merge_all_lanes: true
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.6. blast
blast
docstring for blast wrapper not yet available (no README.md found)
10.7. busco
The busco wrapper BUSCO assesses genome/transcriptome assemblies and annotation completeness.
Required input:
- Fasta file with assembly.
Required output:
- a output directory
Parameters:
- mode: Either genome, transcriptome or proteins
- lineage: Path to a lineage for busco assessment
- options: a list of valid BUSCO options.
Log:
- a log file generated by BUSCO.
Example
rule busco: input: "trinity/trinity.fas" output: directory("busco") params: mode = config["busco"]["mode"], lineage = config["busco"]["lineage"], downloads_path = config["busco"]["downloads_path"], options = config["busco"]["options"] threads: config["busco"]["threads"] log: "busco/busco.log" wrapper: "main/wrappers/busco"
Configuration
###################################################################### # BUSCO section # # :Parameters: # # - mode: Either genome, transcriptome or proteins # - lineage: Path to a lineage for busco assessment # - downloads_path: Directory where downloads are stored # - options: string with any valid BUSCO options busco: mode: 'genome' lineage: 'stramenopiles_odb10' downloads_path: 'busco/busco_downloads' options: '' threads: 4
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.8. bz2_to_gz
The bz2_to_gz wrapper bz2_to_gz converts fastq.gz files to fastq.bz2 files
Here are the steps followed by the rule. Any failure stops the process and the original file is untouched. If all succeed, the input is deleted.
- the input BZ2 file is checked for integrity.
- the input BZ2 file is decompressed with pbunzip2 and redirected a pipe to pigz executable into a GZ output.
- the output is checked for integrity with pigz.
- the input BZ2 file is deleted.
Required input:
- bzipped files (wildcards possible)
Required output:
- output gzipped files (wildcards possible)
Example
rule bz2_to_gz: input: "{sample}.fq.bz2" output: "{sample}/bz2_to_gz/{sample}.fq.gz" threads: config["bz2_to_gz"]["threads"] wrapper: "main/wrappers/bz2_to_gz"
Configuration
###################################################################### # bz2_to_gz section # # :Parameters: # bz2_to_gz: threads: 4
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.9. canu
The canu wrapper This wrapper makes long reads correction and/or assembly using Canu.
This wrapper takes fastq or fasta file. Using corrected or trimmed step option, the output is corrected reads in a fasta.gz file. Without step option, the output is the assembly in a fasta file.
Required input:
- Long reads file (FASTQ/FASTA format)
Required output:
- Assembly result in fasta or corrected reads in fasta.gz.
- The canu{step}.done file is a trigger file to let Snakemake knows that canu computation is over.
Required parameters
- preset: Any preset in this list: - "pacbio": Raw pacbio data - "pacbio-hifi": Hifi pacbio data - "nanopore": Nanopore data
- genome_size: The expected genome size.
- step: The step that you want to do in this list: - "-correct": Canu read correction. - "-trim": Canu read trimming. - "": Default Canu assembly.
- use_grid: Let Canu handle the usage of your cluster or not.
- options: a list of valid Canu options.
Example
rule canu: input: "nice_pb_long_reads.fastq" output: "canu/nice_assembly.contigs.fasta", "canu/canu.done" params: preset = "pacbio", genome_size = "3G", step = "", use_grid = True, options = "" threads: 1 wrapper: "main/wrappers/canu"
Configuration
############################################################################## # Canu long read assembly # # :Parameters: # # - preset: Any preset in this list. (pacbio, pacbio-hifi, nanopore) # - genome_size: An estimate of the size of the genome. Common suffices are allowed, for example, 3.7m or 2.8g. # - use_grid: let canu run steps on cluster. # - step: Any step in this list. (-correct, -trim, "") # - options: any options recognised by canu # - threads: Number of threads to use canu: preset: 'pacbio' genome_size: '3.3m' step: '' use_grid: true options: '' threads: 1
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.10. consensus
consensus
docstring for consensus wrapper not yet available (no README.md found)
10.11. digital_normalisation
The digital_normalisation wrapper Digital normalisation is a method to normalise coverage of a sample in fixed, low memory and without any reference. The assembly with normalised data provides results qs good or even better than assembling the unnormalised data. Furthermore, SPAdes with normalised data is notably speeder and cost less memory than without digital normalisation.
Required input:
- Fastq files (gzip compressed or not). Please provide a list for paired data, a string otherwise
Required output:
- Fastq files uncompressed. Please provide a list for paired data, a string otherwise
Parameters:
Log:
- Log with stdout and sterr of Khmer
Example
Note that the input and output must be named
:
Configuration
############################################################################## # Khmer - Digital Normalisation # # :Parameters: # # - do: if unchecked, this rule is ignored. # - ksize: kmer size used to normalised the coverage. # - cutoff: when the median k-mer coverage level is above this number the read # is not kept. # - max_memory_usage: maximum amount of memory to use for data structure. # - threads: number of threads to be used. # - options: any options recognised by normalize-by-median.py. # digital_normalisation: do: yes ksize: 20 cutoff: 20 max_memory_usage: 4e9 threads: 4 options: ''
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.12. dsrc_to_gz
The dsrc_to_gz wrapper dsrc_to_gz converts fastq.dsrc files to fastq.gz files
Here are the steps followed by the rule. Any failure stops the process and the original file is untouched. If all succeed, the input is deleted.
- the input DSRC file is decompressed with dsrc and redirected a pipe to pigz executable into a GZ output.
- the output is checked for integrity with pigz.
- the input DSRC file is deleted.
Required input:
- a dsrc compressed FASTQ file
Required output:
- a gz compressed FASTQ file
Example
rule dsrc_to_gz: input: "{sample}.fq.dsrc" output: "{sample}/dsrc_to_gz/{sample}.fq.gz" params: options = config["dsrc_to_gz"]["options"] threads: config["dsrc_to_gz"]["threads"] wrapper: "main/wrappers/dsrc_to_gz"
Configuration
###################################################################### # dsrc_to_gz section # # :Parameters: # dsrc_to_gz: threads: 4 options: "-m2"
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.13. falco
The falco wrapper Falco is an emulation of the popular FastQC software to check large sequencing reads for common problems.
Required input:
- one or two FASTQ files (if paired)
Required output:
- one output file from falco such as summary.txt
Parameters:
- options: a list of valid Falco options
- working_directory: place where results will be stored
Log:
- a log file generated by Falco.
Example
rule falco: input: "{sample}_R1.fastq", "{sample}_R2.fastq" output: done = {sample}/falco/summary.txt" params: options = config["falco"]["options"], working_directory = "samples/{sample}" threads: config["falco"]["threads"] log: "falco/falco.log" wrapper: "main/wrappers/falco"
Configuration
###################################################################### # Falco section # # :Parameters: # # - options: string with any valid Falco options falco: options: '' threads: 4
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.14. fastp
fastp
docstring for fastp wrapper not yet available (no README.md found)
10.15. fastq_stats
The fastq_stats wrapper This wrapper creates images and statistics related to FastQ data
This wrapper takes 1 FastQ file as input.
Required input:
- one FastQ file.
Required output:
- gc: PNG image of the GC content
- json: some stats such as GC content, mean read length, etc
- boxplot: a fastqc-like quality image
Required parameters:
- max_reads: uses only 500,000 reads
Example
rule fastq_stats: input: "{sample}.fastq.gz" output: gc="{sample}/fastq_stats/{sample}_gc.png", boxplot="{sample}/fastq_stats/{sample}_boxplot.png", json="{sample}/fastq_stats/{sample}.json" params: max_reads=500000 wrapper: "main/wrappers/fastq_stats"
Configuration
Not required
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.16. fastqc
The fastqc wrapper This wrapper calls FastQC on input data sets (paired or not)
This wrapper takes 1 or 2 FastQ files as input. It creates FastQC reports (HTML and zip file). A fastqc.done file is generated once done.
Required input:
- one FastQ file or two FastQ files (if paired data). Could also be a BAM file.
Required output:
- done: a filename used as a trigger of job done. Be aware that the root directory of this file is used to store the results.
Required parameters:
- options: a list of valid FastQC options
- working_directory: place where results of FastQC are saved
Log:
- a log file generated by FastQC is stored
Example
rule fastqc: input: "{sample}_R1_.fastq", "{sample}_R2_.fastq" output: done = "{sample}/fastqc/fastqc.done" params: options = config['fastqc'][options'], working_directory = "{sample}/fastqc" threads: config['fastqc']["threads"] log: "{sample}/fastqc/fastqc.log" wrapper: "main/wrappers/fastqc"
Configuration
############################################################################## # FastQC section # # :Parameters: # # - options: string with any valid FastQC options fastqc: options: '' threads: 4
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.17. feature_counts
The feature_counts wrapper This wrapper counts reads mapped to genomic regions using featureCounts (subread)
Required input:
- bam:the input BAM file
- gff: the GFF annotation file
Required output:
- counts: the counts file (tabulated file)
- summary: the summary file (tabulated file)
Required Parameters:
- feature: a valid feature to be found in the GFF file
- attribute: a valid attribute to be found in the GFF file
- strandness: a single integer value, 0 (unstranded), 1 (stranded) or 2 (reversely stranded).
- options: any other parameters accepted by featureCounts
Log:
- a log file generated by FeatureCounts
Example
rule feature_counts: input: bam="{sample}/bamfile/{sample}.sorted.bam, gff="genome.gff" output: counts="{sample}/feature_counts/{sample}_feature.out", summary="{sample}/feature_counts/{sample}_feature.out.summary" params: options=config["feature_counts"]["options"], feature=config["feature_counts"]["feature"], attribute=config["feature_counts"]["attribute"], strandness=config["feature_counts"]["strandness"] threads: config["feature_counts"]['threads'] log: "{sample}/feature_counts/feature_counts.log" wrapper: "main/wrappers/feature_counts"
Configuration
###################################################################### # FeatureCounts section # # :Parameters: # # - options: string with any valid FeatureCounts options feature_counts: options: '' feature: gene # could be exon, mRNA, etc attribute: ID # could be ID, gene_id, etc strandness: 0 # set to 0,1 or 2 threads: 4
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.18. freebayes
The freebayes wrapper Freebayes is a variant caller designed to find SNPs and short INDELs from a BAM file. It produces a very well-annotated VCF output. Moreover, it provides a quality score calculated by a bayesian model.
Required input:
- bam: Sorted BAM file.
- ref: FASTA file of the reference genome.
Required output:
- vcf: VCF file of detected variants.
Required parameters:
- options: a list of valid freebayes options
- ploidy: sets the default ploidy for the analysis
Log:
- a log file generated by freebayes
Example
rule freebayes: input: bam = "{sample}/bamfile/{sample}.sorted.bam", ref = "measles.fa" output: vcf = "{sample}/freebayes/{sample}.vcf" log: "{sample}/freebayes/freebayes.log" params: ploidy = config["freebayes"]["ploidy"], options = config["freebayes"]["options"] wrappers" "main/wrappers/freebayes/
Configuration
###################################################################### # Freebayes section # # :Parameters: # # - ploidy: sets the default ploidy for the analysis # - options: string with any valid Freebayes options freebayes: ploidy: 1 options: "--legacy-gls"
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.19. freebayes_vcf_filter
The freebayes_vcf_filter wrapper Variant filter rules dedicated to VCF files generated by freebayes. It filters with freebayes quality score, coverage depth, frequency and strand ratio.
Required input:
- vcf: VCF file from freebayes.
Required output:
- vcf: Filtered VCF file.
- csv: CSV file of filtered variants.
- html: HTML report. Note that the output is always called "variant_calling.html" If you want to save the file in a specific directory, use report_dir parameter
Required parameters:
- report_dir: Report directory to copy JS/CSS.
- filter_dict: A dictionnary of filtering parameters (freebayes score, frequency
minimum depth, forward and reverse depth, and strand ratio).
Example
rule freebayes_vcf_filter: input: vcf = "{sample}/freebayes/{sample}.vcf", output: vcf = "{sample}/freebayes_vcf_filter/{sample}.filter.vcf", csv = "{sample}/freebayes_vcf_filter/{sample}.filter.csv", html = "{sample}/freebayes_vcf_filter/report/{sample}_variant_calling.html" params: filter_dict= { "freebayes_score": config["freebayes_vcf_filter"]["freebayes_score"], "frequency": config["freebayes_vcf_filter"]["frequency"], "min_depth": config["freebayes_vcf_filter"]["min_depth"], "forward_depth": config["freebayes_vcf_filter"]["forward_depth"], "reverse_depth": config["freebayes_vcf_filter"]["reverse_depth"], "strand_ratio": config["freebayes_vcf_filter"]["strand_ration"], } report_dir="report" wrapper: "main/wrappers/freebayes_vcf_filter"
Configuration
###################################################################### # Freebayes vcf filter section # # :Parameters: # freebayes_vcf_filter: freebayes_score: 20 frequency: 0.7 min_depth: 10 forward_depth: 3 reverse_depth: 3 strand_ratio: 0.2
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.20. gz_to_bz2
The gz_to_bz2 wrapper gz_to_bz2 Converts fastq.gz files to fastq.bz2 files.
Here are the steps followed by the rule. Any failure stops the process and the original file is untouched. If all succeed, the input is deleted.
- the input GZ file is checked for integrity.
- the input GZ file is decompressed with pigz and redirected a pipe to pbzip2 executable into a BZ2 output.
- the output is checked for integrity with pbzip2.
- the input GZ file is deleted.
Required input:
- A FASTQ gzip compressed file
Required output:
- A FASTQ bz2 compressed file
Example
rule gz_to_bz2: input: "{sample}.fq.gz" output: "{sample}/gz_to_bz2/{sample}.fq.bz" threads: config["gz_to_bz2"]["threads"] wrapper: "main/wrappers/gz_to_bz2"
Configuration
###################################################################### # gz_to_bz2 # gz_to_bz2: threads: 4
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.21. hmmbuild
hmmbuild
docstring for hmmbuild wrapper not yet available (no README.md found)
10.22. hmmscan
hmmscan
docstring for hmmscan wrapper not yet available (no README.md found)
10.23. index
index
docstring for index wrapper not yet available (no README.md found)
10.24. longorfs
longorfs
docstring for longorfs wrapper not yet available (no README.md found)
10.25. macs3
The macs3 wrapper This wrapper calls peaks in ChIP-seq and ATAC-seq like sequencing runs.
Required input:
- 1 or several sorted BAM files (inputs)
- 1 or several sorted BAM files (controls)
You must set the names IP and control in the input list (see example below).
Required output:
- A set of files, created by macs3
The output files are processed to extract the prefix to be used by macs3. Their prefixes must be the same. You may provide only one output directory or list the different output files created by macs3.
Log:
- a log file with stdout/stderr
Optional parameters:
- qvalue is optional and defaults to 0.05
- options is optional and defaults to '--keep-dup all'
Required parameters:
- bandwidth: 300 # default bandwidth option --bw from macs3
- broad_cutoff: 0.05 is used and required only if the mode is set to 'broad'
- genome_size: this is the mappable size of your genome and is a very important parameter
- mode: 'broad' # broad or narrow. compulsary parameter
- paired: True is the data paired or not. Using sequana manager, you can simply set it to manager.paired
- prefix: tag name. All files will be saved in an output directory according to your output files. However, you also need to provide a prefix to all the files that will be generated. Set to e.g. 'macs3' or the name of a comparison.
Example
For a set of inputs, use a list if required. The outputs must contain the world "narrow" or "broad". Based on this word, macs will be called in narrow and/or broad mode.
- rule macs3:
- input:
- inputs = ["1.bam", "2.bam"], controls = "3.bam",
- output:
- "macs3/narrow/tag_peaks.xls",
- params:
- bandwidth = 300, broad_cutoff = 0.05, genome_size = 4000000, mode = "narrow", options = " --keep-dup all ", paired = True, prefix = "tag", qvalue = 0.05
- log:
- out.log
- wrapper:
- "main/wrappers/macs3"
Configuration
############################################################################################ # macs3 peak caller # # bandwidth: 300 # default bandwidth option --bw from macs3 # broad_cutoff: 0.05 # genome_size: 4000000 # the mappable size of your genome. # mode: 'broad' # broad or narrow. compulsary # options: --keep-dup all ## we keep all duplicates and let macs3 do the job # paired: True # prefix: tag name # qvalue: 0.05
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.26. makeblastdb
makeblastdb
docstring for makeblastdb wrapper not yet available (no README.md found)
10.27. mark_duplicates
The mark_duplicates wrapper This wrapper marks duplicates using picard tools so that tools such as variant caller are aware of duplicated reads
Required input:
- BAM file
Required output:
- The first output must be the expected output BAM file with the marked duplicated reads.
- The second output contains metrics
Log:
- a log file with stdout/stderr
Required parameters:
- remove_dup if set to true will remove the duplicated reads (default is False)
- tmpdir is used to stored temporary files
Example
rule mark_duplicates: input: "test.bam" output: bam = "test/md.bam", metrics = "test/md.metrics" log: out = "test/log.out", err = "test/log.err" params: remove_dup = "false", tmpdir = "test/tmp" wrapper: "main/wrappers/mark_duplicates"
Configuration
############################################################################# # mark_duplicates (picard-tools) allows to mark PCR duplicate in BAM files # # :Parameters: # # - do: if unchecked, this rule is ignored. Mandatory for RNA-SeQC tool. # - remove: If true do not write duplicates to the output file instead of writing them with # appropriate flags set. Default value: false. This option can be set to 'null' to clear # the default value. Possible values: {true, false} # - tmpdir: write tempory file on this directory (default TMP_DIR=/tmp/, but could be "TMP_DIR=/local/scratch/") # mark_duplicates: do: false remove: false ## may be True tmpdir: ./tmp/
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.28. minimap2
The minimap2 wrapper This wrapper maps sequencing reads onto a reference using Minimap2
This wrapper takes single or paired data and map them on a reference. The results file is a sorted and BAM file (not indexed).
Required input:
- fastq: the input FastQ files (single or paired)
- reference: the input genome used as a reference. No need for indexing
Required output::
- sorted: the sorted output BAM file
Required parameters:
- options: a list of valid fastqc options
Log:
- a log file generated by minimap2 is stored
Example
rule minimap2: input: fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"], reference="genome.fasta" output: sorted="{sample}/{sample}.sorted.bam" log: "{sample}/{sample}.log" params: options=config['minimap2']['options'] threads: config['minimap2']['threads'] wrapper: "main/wrappers/minimap2"
Configuration
############################################################################# # Bowtie2 read mapping # # :Parameters: # # - options: any options recognised by minimap2 # - threads: number of threads to be used minimap2: options: '' threads: 4
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.29. multiqc
The multiqc wrapper This wrapper aggregates results from various bioinformatics analyses across many samples
Required input:
- list of files expected by MultiQC
Required output:
- done: a filename used as a trigger of job done. Be aware that the root directory of this file is used to store the results.
Required parameters:
- options: a list of valid MultiQC options
- working_directory: place where results of MultiQC are saved
Log:
- a log file generated by MultiQC is stored
Example
rule multiqc: input: ["file1", "file2"] output: "multiqc/multiqc_report.html" params: options=config['multiqc']['options'], input_directory=config['multiqc']['input_directory'], config_file=config['multiqc']['config_file'], modules=config['multiqc']['modules'] log: "multiqc/multiqc.log" wrapper: "main/wrappers/multiqc"
Configuration
############################################################################## # MultiQC section # # :Parameters: # # - options: string with any valid MultiQC options multiqc: options: '' threads: 4
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.30. polypolish
The polypolish wrapper This wrapper calls polypolish
This wrapper takes a draft assembly as input as well as a SAM file. The SAM file should be the alignment of Illumina data onto the assembly, as recommended by the polypolish author.
Required input:
- alignment: A SAM file.
- assembly: A FastA file, the draft assembly.
Required output:
- The corrected assembly in FastA format.
Optional parameters:
- options: a list of valid polypolish options (default to nothing)
Log:
- a log file generated by Polypolish is stored
Example
The params section is optional. The input can be named or not. If not, alignment must be first and assembly must follow. The output can be named or not.
- rule polypolish:
- input:
- alignment="alignments.sam", assembly="assembly.fasta"
- output:
- fasta = "polypolish/polypolish.fasta"
- params:
- options= " ",
- log:
- "polypolish/test.log"
- wrapper:
- "main/wrappers/polypolish"
Configuration
############################################################################## # Polypolish section # # :Parameters: # # - options: string with any valid FastQC options polypolish: options: ''
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.31. predict
predict
docstring for predict wrapper not yet available (no README.md found)
10.32. prokka
The prokka wrapper This wrapper annotates assembly with Prokka.
Required input:
- assembly: the input assembly fasta file.
Required output::
- output: any file you need (.gbk/.gff/.fna).
Required parameters:
- options: a list of valid prokka options.
Log:
- a log file generated by prokka is created.
Example
rule prokka: input: "assembly/contigs.fasta" output: "prokka/contigs.gff" params: options=config["prokka"]["options"] log: "logs/prokka.log" threads: config["prokka"]["threads"] wrapper: "main/wrappers/prokka"
Configuration
############################################################################## # Prokka # # :Parameters: # # - options: any options recognised by prokka cli. # - threads: number of threads to be used. # prokka: options: threads: 4
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.33. quast
The quast wrapper This wrapper assesses an assembly using Quast.
Required input:
- fastq: the input fastq files.
- assembly: the input assembly fasta file.
Required output::
- output: Any empty filename. The path defines the quast output directory.
Required parameters:
- preset: preset for application (single, pacbio, nanopore). Ignored if fastq are paired-end.
- reference: reference genome fasta file.
- annotation: file with genomic feature coordinates in the reference (GFF, BED, NCBI or TXT).
- options: a list of valid Quast options.
Log:
- a log file generated by quast is created.
Example
rule quast: input: fastq="data/raw.fastq.gz", assembly="assembly/contigs.fasta" output: "quast/quast.done" params: preset=config["quast"]["preset"], reference=config["quast"]["reference"], annotation=config["quast"]["annotation"], options=config["quast"]["options"], log: "logs/quast.log" threads: config["quast"]["threads"] wrapper: "main/wrappers/quast"
Configuration
############################################################################## # Quast # # :Parameters: # # preset: preset for application (single, pacbio, nanopore). Ignored if fastq are paired-end. # reference: reference genome fasta file. # annotation: file with genomic feature coordinates in the reference (GFF, BED, NCBI or TXT). # options: a list of valid quast.py options. # quast: preset: pacbio reference: path_to/knowned_ref.fa annotation: path_to/knowed_ref.gff options: "--fungus" threads: 4
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.34. rulegraph
The rulegraph wrapper This wrapper creates a rulegraph showing your pipeline dependencies
Required input:
- the snakefile filename
Required ouput:
- the output dot filename
Required parameters:
- mapper: a dictionary mapping each rule to a URL (HTML file or directory). Rules provided in this dictionary will be shown in blue and clickable in the ouptut SVG file.
- configname: a config file required by the input Snakefile
- required_local_files: a list of required files and directories next to the Snakefile that are required to run the pipeline.
Example
rulegraph_params_mapper = {"a_given_rule": "its_html.html"} rule rulegraph: input: manager.snakefile output: "rulegraph/rulegraph.dot" params: configname = "config.yaml", mapper = rulegraph_params_mapper, required_local_files = ['rules/',] container: "https://zenodo.org/record/7928262/files/graphviz_7.0.5.img" wrapper: "main/wrappers/rulegraph"
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.35. sambamba_filter
The sambamba_filter wrapper This rule uses sambamba view to filter reads with a mapping quality lower than a threshold. It also removes reads with multiple occurrence.
Required input:
- BAM file
Required output:
- the filtered output BAM file.
Parameters:
- threshold: Quality threshold used for filtering
- options: a list of valid sambamba filter options
Log:
- two log files named .out and .err
Example
rule sambamba_filter: input: "{sample}/bamfile/{sample}.sorted.bam" output: "{sample}/sambamba_filter/{sample}.sorted.bam" log: out = "{sample}/sambamba_filter/log.out", err = "{sample}/sambamba_filter/log.err" params: threshold = config["sambamba_filter"]["threshold"], options = config["sambamba_filter"]["options"] wrapper: "main/wrappers/sambamba_filter"
Configuration
###################################################################### # Sambamba filter section # # :Parameters: # # - options: string with any valid Sambamba filter options # - threshold: quality threshold used for filtering sambamba_filter: threshold: 30 # Mapping quality score threshold options: ''
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.36. sambamba_markdup
The sambamba_markdup wrapper This rule marks or removes PCR duplicate reads. For determining whether a read is a duplicate or not, the same criteria as in Picard are used.
Required input:
- BAM file
Required output:
- bam:The first output must be the expected output BAM file marked duplicates.
- bai: (optional) you may specify the BAM index file, which will be generated by the wrapper.
Parameters:
- remove_duplicates: Remove or just mark duplicate reads
- tmp_directory: Temporary directory
- options: a list of valid sambamba markdup options
Log:
- a log file generated by sambamba is created.
Example
rule sambamba_markdup: input: "{sample}/sambamba_filter/{sample}.sorted.bam" output: bam="{sample}/sambamba_markdup/{sample}.sorted.bam", bai="{sample}/sambamba_markdup/{sample}.sorted.bam.bai" log: "{sample}/sambamba_markdup/log.out", params: remove_duplicates = config["sambamba_markdup"]["remove_duplicates"], tmp = config["sambamba_markdup"]["tmp_directory"], options = config["sambamba_markdup"]["options"] wrapper: "main/wrappers/sambamba_markdup"
Configuration
###################################################################### # Sambamba markdup section # # :Parameters: # # - remove_duplicates: Remove or just mark duplicate reads # - tmp_directory: Temporary directory # - options: string with any valid Sambamba markdup options sambamba_markdup: remove_duplicates: False tmp_directory: /tmp options: ''
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.37. samtools_depth
The samtools_depth wrapper Samtools Depth creates a BED file with the coverage depth for each base position. It can also compute multiple BAM files and concatenate results in one BED file.
Required input:
- A Sorted BAM file or list of bam file.
Required output:
- A BED file with coverage for each base.
Required parameters:
- options: a list of valid Samtools Depth options
Log:
- The redirected standard error of Samtools Depth
Example
rule samtools_depth: input: "{sample}/bamfile/{sample}.sorted.bam" output: "{sample}/samtools_depth/{sample}.bed" log: "{sample}/samtools_depth/samtools_depth.log" params: options = config['samtools_depth']['options'] wrapper: "main/wrappers/samtools_depth"
Configuration
###################################################################### # Samtools Depth section # # :Parameters: # # - options: string with any valid Samtools Depth options samtools_depth: options: ''
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.38. sequana_coverage
The sequana_coverage wrapper Sequana coverage detects and characterises automatically low and high genome coverage regions. It provides a useful HTML report with dynamic plot and table. Moreover, a CSV file with all metrics computed is created. This CSV can be used to regenerate the sequana_coverage report.
Required input:
- bed: a BED file (built e.g with samtools_depth -aa input.bam)
- fasta: a FASTA file of the reference.
- gbk: a GENBANK file. (Optional)
Required output:
- FASTA file with locus names.
Log:
- a log file
Example
rule sequana_coverage: input: bed = "test.bed", fasta = "measles.fa", gbk = "measles.gbk" output: html = "test/sequana_coverage.html" params: mixture_models = 2, window_size = 3001, double_threshold = 0.5, circular = True, chunksize = 5000000, gc_window_size = 201, mixture_models = 2 wrapper: "main/wrappers/sequana_coverage"
Configuration
############################################################################## # # :Parameters: # # :param circular: is your genome circular or not ? # :param chunksize: for large genomes, split the data into chunks # :param double_threshold: double threshold for clustering. Keep 0.5 if you do # not know. Otherwise, checkout the online documentation on # sequana.readthedocs.io # :param high_threshold: keep 4 or check the online documentation # :param low_threshold: keep -4 or check the online documentation # :param mixture_models: keep to 2. # :param window: the W parameter of the running median. Keep as long as twice # the deleted/depleted/duplicated you want to identify or to avoid. short # genome will be set to genome length divided by 5 automatically. sequana_coverage: do: true circular: true chunksize: 6000000 double_threshold: 0.5 gc_window_size: 201 high_threshold: 4.0 low_threshold: -4.0 mixture_models: 2 window_size: 3001
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.39. sequana_taxonomy
The sequana_taxonomy wrapper Sequana taxonomy performs taxonomic analysis using Kraken2. It is essentially a wrapper of kraken that hides technical details and a single standalone to cope with paired/unpaird data, multiple kraken databases, conversion of taxons, etc. It also provides a HTML report
Required input:
- fastq: a BED file (built e.g with samtools_depth -aa input.bam)
Required output:
The output of sequana_taxonomy is multiple. In the output directory, you will find a summary.html, and a sub directory called kraken with a CSV file called kraken.csv, a summary file called kraken.out.summary a json summary file called summary.jsona and finally unclassified fastq files. The option store_unclassified is currently required. If set to False, empty files are created.
Example
rule sequana_taxonomy: input: "test_R1_.fastq" # second file for paired data is possible output: html = '{sample}/summary.html', csv = '{sample}/kraken/kraken.csv', summary = '{sample}/kraken/kraken.out.summary', summary_json = '{sample}/kraken/summary.json', unclassified = '{sample}/kraken/unclassified.fastq' threads: 4 params: # required paired=True, databases=['toydb'], # a list of valid kraken databases store_unclassified=True, # optional confidence=0, level="INFO", options="" container: https://zenodo.org/record/7963917/files/sequana_tools_0.15.1.img wrapper: "main/wrappers/sequana_taxonomy"
Configuration
############################################################################## # # :Parameters: # # # * paired indicates whether input fastq is paired or not. # * databases: a list of valid databases # * store_unclassified # # optional ones: confidence (default to 0), level (default to INFO) and # options (default to "") sequana_taxonomy: databases: - /home/user/.config/sequana/kraken2_dbs/viruses_masking level: INFO confidence: 0.05 store_unclassified: false options: '' threads: 4
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.40. snpeff
The snpeff wrapper SnpEff adds annotation of variants detected in a VCF file. It annotates using the old 'EFF' field instead of 'ANN' field. The latter does not provide the codon change information.
Required input:
- vcf: VCF file of detected variants.
- ann: Annotation genbank file
Required output:
- vcf: Annotated VCF file.
- html: HTML report
- csv: CSV file with Variants
Example
rule snpeff: input: vcf = "{sample}/freebayes/{sample}.vcf", ann = "measles.gbk" output: vcf = "{sample}/snpeff/{sample}.vcf", csv = "{sample}/snpeff/{sample}.csv", html = "{sample}/snpeff/{sample}.html" log: "{sample}/snpeff/{sample}.log" params: options = config["snpeff"]["options"] wrapper: "main/wrappers/snpeff"
Configuration
###################################################################### # SNPEff section # # :Parameters: # # - options: string with any valid SNPEff options snpeff: options: '-no-downstream'
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.41. snpeff_add_locus_in_fasta
The snpeff_add_locus_in_fasta wrapper SnpEff requires the locus names in the annotation file and in the FASTA file (contig name) to be identical. To make this is true, this rule adds locus names of the genbank file into the FASTA file before the mapping.
Required input:
- fasta FASTA file of the reference.
- ann GENBANK or GFF file for annotation
Required output:
- FASTA file with locus names.
Log:
- a log file
Example
rule snpeff_add_locus_in_fasta: input: fasta="{sample}.fas", ann="{sample}.gbk" output: "{sample}/snpeff_add_locus_in_fasta/{sample}.fas log: "{sample}/snpeff_add_locus_in_fasta/{sample}.log" container: "https://zenodo.org/record/7963917/files/sequana_tools_0.15.1.img" wrapper: "main/wrappers/snpeff_add_locus_in_fasta
Configuration
There is no configuration required for this wrapper.
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.42. sort
sort
docstring for sort wrapper not yet available (no README.md found)
10.43. spades
The spades wrapper This wrapper generates de novo assembly using SPAdes.
Required input:
- fastq: the input fastq files.
Required output::
- contigs: the contigs fasta file.
- scaffolds: the scaffolds fasta file.
Required parameters:
- k: list of k-mer sizes (must be odd and less than 128). The default is 'auto' mode.
- preset: any preset in this list ["meta", "sc", "isolate", "metaplasmid", "metaviral", "rna", "rnaviral"]. Empty by default.
- options: a list of valid SPAdes options. Some options are incompatible with some preset.
- memory: RAM limit for SPAdes in Gb. Make sure this value is correct for the given machine. SPAdes uses the limit value to automatically determine the sizes of various buffers, etc. The wrapper default is 32 Gb.
Notes: This wrapper cannot be used to perform correction only.
Log:
- a log file generated by SPAdes is created.
Example
rule spades: input: fastq="data/raw.fastq.gz", output: contigs="assembled/contigs.fasta", scaffolds="assembled/scaffolds.fasta" params: k=config["spades"]["k"], preset=config["spades"]["preset"], options=config["spades"]["options"], memory=config["spades"]["memory"] log: "logs/spades.log" threads: config["spades"]["threads"] wrapper: "main/wrappers/spades"
Configuration
############################################################################## # SPAdes assembly # # :Parameters: # # - k: A kmer or list of kmer used to assemble the genome. # - preset: any preset in this list ["meta", "sc", "isolate", "metaplasmid", "metaviral", "rna", "rnaviral"] # - options: any options recognised by spades.py cli. (do not use --only-error-correction) # - threads: number of threads to be used. # - memory: memory limit in Gb. # spades: k: "" preset: "" options: "--careful" threads: 4 memory: 64
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.44. trinity
The trinity wrapper De novo transcriptome assembly from Illumina RNA-seq data using Trinity.
Required input:
- Path to fastq or fasta files (paired or single)
Required output:
- Path to output assembly fasta file
Parameters:
- options: a list of valid Trinity options.
Ressources:
- mem_gb or mem_mb: set the maximum memory usage.
Log:
- a log file generated by Trinity.
Example
## Paired reads
- rule trinity:
- input:
- left = expand("{sample}_R1.fq.gz", sample=samples), right = expand("{sample}_R2.fq.gz", sample=samples)
- output:
- "trinity/trinity.fas"
- params:
- options = config["trinity"]["options"],
- threads:
- config["trinity"]["threads"]
- resources:
- mem_gb = config["trinity"]["mem_gb"]
- log:
- "trinity/trinity.log"
- wrapper:
- "main/wrappers/trinity"
## Single reads
- rule trinity:
- input:
- left = expand("{sample}.fq.gz", sample=samples)
- output:
- "trinity/trinity.fas"
[...]
Configuration
###################################################################### # Trinity section # # :Parameters: # # - options: string with any valid Trinity options trinity: options: '' threads: 4 mem_gb: 10
References
- https://github.com/trinityrnaseq/trinityrnaseq/wiki
- https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/trinity.html
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.45. trinity_quantify
The trinity_quantify wrapper De novo transcriptome quantification from Illumina RNA-seq data using Trinity.
Required input:
- fasta: Path to the transcriptome assembly from Trinity.
- left: List of paths to fastq files for R1.
- right: (optional) List of paths to fastq files for R2.
Required output:
- outdir: output directory.
- transcripts: Path to abundance file for transcripts.
- genes: Path to abundance file for genes.
Parameters:
- lib_type: Type of strand-specific library (RF, FR, F or R).
- est_method: The method to use for quantification: kallisto (default), salmon or RSEM.
Log:
- a log file generated by Trinity.
Example
rule quantify: input: fasta="trinity_out_dir/Trinity.fasta", left="{sample}/fastp/{sample}_R1_trimmed.fastq", right="{sample}/fastp/{sample}_R2_trimmed.fastq", output: outdir=directory("{sample}/kallisto"), transcripts="{sample}/kallisto/abundance.tsv", genes="{sample}/kallisto/abundance.tsv.genes", params: lib_type="RF", est_method="salmon" log: "logs/trinity/{sample}_quantify.log" threads: 2 wrapper: "main/wrappers/trinity_quantify"
Configuration
###################################################################### # Trinity quantification section # # :Parameters: # # - lib_type: string strand-specificity library type # - est_method: string quantification method trinity_quantify: lib_type: 'RF' est_method: 'salmon' threads: 4
References
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues
10.46. unicycler
The unicycler wrapper This wrapper generates de novo assembly using Unicycler an assembly pipeline for bacterial genomes.
Required input:
- fastq: the input fastq files.
Required output::
- contigs: the contigs fasta file.
Required parameters:
- mode: Bridging mode. (conservative|normal|bold) (default: "normal")
- options: a list of valid Unicycler options.
Notes: This wrapper cannot be used to perform correction only.
Log:
- a log file generated by Unicycler is created.
Example
rule unicycler: input: fastq="data/raw.fastq.gz", output: contigs="assembled/contigs.fasta", params: mode=config["unicycler"]["mode"], options=config["unicycler"]["options"], log: "logs/unicycler.log" threads: config["unicycler"]["threads"] wrapper: "main/wrappers/unicycler"
Configuration
############################################################################## # Unicycler assembly # # :Parameters: # # - mode: any bridging mode in this list ["conservative", "normal", "bold"] # - options: any options recognised by unicycler cli. # - threads: number of threads to be used. # unicycler: mode: "normal" options: "" threads: 4
Found a bug or have an issue ? Please report here https://github.com/sequana/sequana-wrappers/issues