CLI reference#

The top-level sequana command groups ~30 sub-commands (FASTQ/FASTA utilities, GFF/GTF fixers, enrichment helpers, summaries, …). The reference below is generated directly from the source.

Each sub-command also accepts --help from the shell, e.g.:

sequana fastq --help
sequana enrichment-kegg --help

sequana#

This is the main entry point for a set of Sequana applications.

Pipelines such as sequana_rnaseq, sequana_variant_calling have their own application and help.

In addition, more advanced tools such as sequana_taxonomy or sequana_coverage have their own standalone.

To setup completion, type this command depending on your shell (bash):

eval "$(_SEQUANA_COMPLETE=bash_source sequana)" eval "$(_SEQUANA_COMPLETE=source_zsh sequana)" eval (env _SEQUANA_COMPLETE=source_fish sequana)

Usage

sequana [OPTIONS] COMMAND [ARGS]...

Options

--version#

Show the version and exit.

biomart#

Retrieve information from biomart and save into CSV file

This command uses BioMart from BioServices to introspect a MART service (--mart) and a specific dataset (default to mmusculus_gene_ensembl). Then, for all ensembl IDs, it will fetch the requested attributes (--attributes). Finally, it saves the CSV file into an output file (--output). This takes about 5-10 minutes to retrieve the data depending on the connection.

Example:

sequana biomart --mart mmusculus_gene_ensembl --mart ENSEMBL_MART_ENSEMBL --dataset mmusculus_gene_ensembl --attributes ensembl_gene_id,external_gene_name,go_id --output test.csv

Usage

sequana biomart [OPTIONS]

Options

--mart <mart>#

A valid mart name

Default:

'ENSEMBL_MART_ENSEMBL'

--host <host>#

A valid mart name such as www.ensembl.org (default) or fungi.ensembl.org

--dataset <dataset>#

Required A valid dataset name. e.g. mmusculus_gene_ensembl, hsapiens_gene_ensembl

--attributes <attributes>#

A valid set of attributes to look for in the dataset. Multiple attributes are separeted by a comma (no spaces accepted)

Default:

'ensembl_gene_id,go_id,entrezgene_id,external_gene_name'

--output <output>#

by default save results into a CSV file named biomart_<dataset>_<YEAR>_<MONTH>_<DAY>.csv

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

blast-to-gff#

Convert a Blast file (outfmt=6)

Usage

sequana blast-to-gff [OPTIONS] INPUT_BLAST OUTPUT_GFF

Options

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

Arguments

INPUT_BLAST#

Required argument

OUTPUT_GFF#

Required argument

embl-to-fasta#

Convert a emble to fasta

Usage

sequana embl-to-fasta [OPTIONS] INPUT_EMBL OUTPUT_FASTA

Options

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

Arguments

INPUT_EMBL#

Required argument

OUTPUT_FASTA#

Required argument

enrichment-kegg#

Create a HTML report showing KEGG enriched pathways

Example for the enrichment module:

sequana enrichment-kegg rnadiff_output_dir --log2-foldchange-cutoff 2

The KEGG pathways are loaded and it may take time. Once done, they are saved in kegg_pathways/organism and be loaded next time:

sequana enrichment-kegg rnadiff_output_dir --log2-foldchange-cutoff 2

--kegg-name lbi --annotation-attribute gene_name

Usage

sequana enrichment-kegg [OPTIONS] RNADIFF_DIRECTORY

Options

--annotation-attribute <annotation_attribute>#

a valid attribute to be used to map on KEGG database

Default:

'Name'

--kegg-name <kegg_name>#

Required a valid KEGG name (hsa for human, mmu for mus musculus); See the taxonomy command to retrieve other names

--log2-foldchange-cutoff <log2_foldchange_cutoff>#

remove events with absolute log2 fold change below this value

Default:

1

--padj-cutoff <padj_cutoff>#

remove events with pvalue above this value default (0.05).

Default:

0.05

--biomart <biomart>#

you may need a biomart mapping of your identifier for the kegg pathways analysis. If you do not have this file, you can use 'sequana biomart' command

--plot-linearx#

Default is log2 fold enrichment in the plots. use this to use linear scale

--kegg-pathways-directory <kegg_pathways_directory>#

a place where to find the pathways for each organism

--comparison <comparison>#

By default analyses all comparisons found in input file. You may set one specifically with this argument.

--max-pathways <max_pathways>#

Max number of pathways to show (most enriched)

Default:

40

--kegg-background <kegg_background>#

a background for kegg enrichment. If None, set to the number of genes used in the differential analysis (input file rnadiff.csv).

--condition <condition>#

The name of the column used in the design file to define groups.

--output-directory <output_directory>#
--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

Arguments

RNADIFF_DIRECTORY#

Required argument

enrichment-panther#

Create a HTML report for various sequana outputs.

Input: the enrichment output of the RNADiff pipeline.

Example:

sequana enrichment-panther rnadiff.csv --panther-taxon 10090 \
    --log2-foldchange-cutoff 2

sequana enrichment rnadiff/rnadiff.csv \
    --panther-taxon 189518 \
    --log2-foldchange-cutoff 2 \
    --ontologies MF SLIM_MF

Valid ontologies are: MF, BP, CC, SLIM_MF, SLIM_BP, SLIM_CC, PROTEIN, PANTHER_PATHWAY, REACTOME_PATHWAY.

Usage

sequana enrichment-panther [OPTIONS] RNADIFF_DIRECTORY

Options

--annotation-attribute <annotation_attribute>#

a valid taxon identifiers

Default:

'index'

--panther-taxon <panther_taxon>#

Required a valid taxon identifiers

--log2-foldchange-cutoff <log2_foldchange_cutoff>#

remove events with absolute log2 fold change below this value

Default:

1

--padj-cutoff <padj_cutoff>#

remove events with pvalue abobe this value default (0.05).

Default:

0.05

--plot-linearx#

Default is log2 fold enrichment in the plots. use this to use linear scale

Default:

False

--compute-levels, --no-compute-levels#

Compute the levels of each go term, set --no-compute-levels to skip this step

--max-genes <max_genes>#

Maximum number of genes (up or down) to use in PantherDB.

Default:

2500

--ontologies <ontologies>#

Provide the ontologies to be included in the analysis and HTML report. Valid choices are: from MF, BP, CC, SLIM_MF, SLIM_BP, SLIM_CC, PROTEIN, PANTHER_PATHWAY, REACTOME_PATHWAY

Default:

'MF', 'BP', 'CC'

--max-enriched-go-terms <max_enriched_go_terms>#

Max number of enriched go terms to show in the plots (most enriched). All enriched GO terms are stored in tables

Default:

40

--condition <condition>#

The name of the column used in the design file to define groups.

--output-directory <output_directory>#
Default:

'enrichment_panther'

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

Arguments

RNADIFF_DIRECTORY#

Required argument

enrichment-uniprot#

GO enrichment using Uniprot and HTML report creation

The input should be the output of the RNADiff pipeline

Example:

sequana enrichment-uniprot rnadiff.csv --taxon 10090
    --log2-foldchange-cutoff 2

sequana enrichment rnadiff/rnadiff.csv
    --taxon 189518             --log2-foldchange-cutoff 2
    --ontologies MF
Valid ontologies are: MF, BP, CC

Usage

sequana enrichment-uniprot [OPTIONS] RNADIFF_DIRECTORY

Options

--annotation-attribute <annotation_attribute>#

a valid taxon identifiers

Default:

'index'

--taxon <taxon>#

Required a valid taxon identifiers

--log2-foldchange-cutoff <log2_foldchange_cutoff>#

remove events with absolute log2 fold change below this value

Default:

1

--padj-cutoff <padj_cutoff>#

remove events with pvalue abobe this value default (0.05).

Default:

0.05

--plot-linearx#

Default is log2 fold enrichment in the plots. use this to use linear scale

Default:

False

--compute-levels, --no-compute-levels#

Compute the levels of each go term, set --no-compute-levels to skip this step

--max-genes <max_genes>#

Maximum number of genes (up or down) to use in PantherDB.

Default:

2500

--ontologies <ontologies>#

Provide the ontologies to be included in the analysis and HTML report. Valid choices are: from MF, BP, CC

Default:

'MF', 'BP', 'CC'

--max-enriched-go-terms <max_enriched_go_terms>#

Max number of enriched go terms to show in the plots (most enriched). All enriched GO terms are stored in tables

Default:

40

--condition <condition>#

The name of the column used in the design file to define groups.

--output-directory <output_directory>#
Default:

'enrichment_uniprot'

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

Arguments

RNADIFF_DIRECTORY#

Required argument

fasta#

Set of useful utilities for FastA manipulation.

Usage

sequana fasta [OPTIONS] FILENAME...

Options

-o, --output <output>#

filename where to save results. to be used with --merge;, --save-contig-name

--count-sequences#

prints number of sequences

--to-chrom-size#

extract name and size and print on stdout, or save in a file (--output)

--merge#

merge all compressed input fasta files into a single file

--save-contig-name <save_contig_name>#

save sequence corresponding to this contig name

--explode#

Create a fasta file for each sequence found in the original files

--extract <extract>#

Extra one sequence from a fasta and save in new file

--reverse-complement#

Create a fasta file with reversed complement sequence

Arguments

FILENAME#

Required argument(s)

fastq#

Set of useful utilities for FastQ manipulation.

Input file can be gzipped or not. The --output-file

Usage

sequana fastq [OPTIONS] [FILENAME]...

Options

-o, --output <output>#

filename where to save results. to be used with --head, --tail

--count-reads#
--head <head>#

number of reads to extract from the head

--merge#

merge all compressed input fastq files into a single file

--tail <tail>#

number of reads to extract from the tail

Arguments

FILENAME#

Optional argument(s)

fastq-split#

Split a FASTQ file into smaller parts (by number of reads or by number of parts).

Examples:

sequana fastq-split reads.fastq --by-size 1000000 --pattern chunk.#.fastq.gz

sequana fastq-split reads.fastq.gz --by-part 10 --pattern sample.#.fastq.gz

Usage

sequana fastq-split [OPTIONS] INPUT_FASTQ

Options

--by-size <by_size>#

Split file into chunks of N reads each.

--by-part <by_part>#

Split file into N equal parts.

--pattern <pattern>#

Output filename pattern (use # as placeholder for chunk number).

Default:

'output.#.fastq.gz'

--gzip#

Compress output with gzip.

--buffer-size <buffer_size>#

Number of reads to buffer before writing to disk.

Default:

10000

Arguments

INPUT_FASTQ#

Required argument

feature-counts#

Merge several feature counts files into one file

Usage

sequana feature-counts [OPTIONS]

Options

--pattern <pattern>#

The pattern of the feature counts files to merge

Default:

'*feature.out'

--output <output>#

The output filename where to save the merged counts

Default:

'all_features.out'

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

find-integrated-genes#

Find overlapping reads on host and possible integrated genes.

This script scans a BAM file and identifies reads that map onto a gene name provided by the user. Then, it checks whether these reads also map onto the other gene names.

This is very convenient to search for an integrated plasmid or gene in a host. For instance a transgene in a mammal genome.

The candidate reads are saved in various files:

  • a NAME_info.csv file with information about reads that were found of interest. The header is M,S,H,I,D,flag,chr,position,identifier. M, S, H, I and D indicates the number of bases that map (M), are soft clipped (S), hard clipped (H), inserted (I), or deleted (D). We also add the flag (type of mapping), the chromosome and position (chr, position) and finally the read identifier.

  • a NAME_reads_IDs.csv that contains the unique identifier reads

  • a FastQ if --save-reads-as-fastq option is used

  • a FastA if --save-reads-as-fasta option is used

Note that secondary reads are ignored (flag 256, 272).

We also removed reads that map at a single place (no overlap).

To obtain the BAM files, you can use the sequana_mapper pipeline (sequana/mapper).

The BAM file should be indexed with samtools for better performances.

Usage

sequana find-integrated-genes [OPTIONS]

Options

--bam-file <bam_file>#

Required The BAM file to introspect

--name <name>#

Required The name of the gene that is suppose to be integrated in the host.

--tag <tag>#

By default all output files contain the name provided with --name. If you wish to override this behaviour you can use the --tag option. FastQ and FastA files will be named tag.fastq and tag.fasta

--save-reads#

If provided, save reads of interest in a FastQ file named {name}.fastq and FastA file named {name}.fasta

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

g4hunter#

Based on G4Hunter

takes into account G-richness and G-skewness of a given sequence and gives a quadruplexpropensity score as output.')

Usage

sequana g4hunter [OPTIONS]

Options

-i, --input <infile>#

input FASTA file

-o, --output <outdir>#

output directory

--window <window>#
Default:

20

--score <score>#
Default:

1

gff#

Set of useful utilities for GFF manipulation.

Usage

sequana gff [OPTIONS] [FILENAME]...

Options

-o, --output <output>#

filename where to save results.

--add-CDS-and-mRNA#

add CDS and mRNA

--gene-id <gene_id>#

Uses this identifier to get gene names to build new DS and mRNA identifiers

Arguments

FILENAME#

Optional argument(s)

gff-to-gtf#

Convert a GFF file into GTF

This is experimental convertion. Use with care.

Usage

sequana gff-to-gtf [OPTIONS] GFF_FILENAME

Options

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

Arguments

GFF_FILENAME#

Required argument

gff-to-light-gff#

Extract the feature of interest in the input GFF to create a light version

sequana gff-to-light-gff input.gff output.gff --features gene,exon

Usage

sequana gff-to-light-gff [OPTIONS] INPUT OUTPUT

Options

--features <features>#

list of features to be extracted

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

Arguments

INPUT#

Required argument

OUTPUT#

Required argument

gtf-fixer#

Reads GTF and fix known issues (exon and genes uniqueness)

Usage

sequana gtf-fixer [OPTIONS]

Options

-i, --input <input>#

Required

-o, --output <output>#

Required

html-report#

Create a HTML report for various type of data set

# VCF

The VCF module takes an input VCF file and filter it before creating an HTML report. The options are related to the variant quality. We assume that the VCF file was created with sequana_variant_calling pipeline and so relied on freebayes software. If so, you can provide a freebayes minimal score, remove variants below a given frequency (--frequency), etc (please see --help for other parameters

Usage

sequana html-report [OPTIONS] NAME

Options

--output-directory <output_directory>#
Default:

'.'

--output-vcf-file <output_vcf_file>#

Path to output VCF file.

Default:

'sequana.filter.vcf'

--output-csv-file <output_csv_file>#

Path to output CSV file.

Default:

'sequana.filter.csv'

--freebayes-score <freebayes_score>#

Freebayes score threshold.

Default:

20

--strand-ratio <strand_ratio>#

Minimum strand ratio.

Default:

0.2

--frequency <frequency>#

Minimum allele frequency.

Default:

0.1

--min-depth <min_depth>#
Default:

10

--forward-depth <forward_depth>#
Default:

3

--reverse-depth <reverse_depth>#
Default:

3

--keep-polymorphic <keep_polymorphic>#
Default:

True

Arguments

NAME#

Required argument

lane-merging#

Merge lanes.

Looks for data stored either as:

<sampleID_1>/*fastq.gz
<sampleID_2>/*fastq.gz
<sampleID_3>/*fastq.gz

or as:

sampleID_L001_.fastq.gz
sampleID_L002_.fastq.gz

Usage

sequana lane-merging [OPTIONS]

Options

--lanes <lanes>#

Required

-o, --output-directory <output_directory>#

Required Where to store the new fastq files

Default:

'merging'

--pattern <pattern>#

pattern for the input fastq files. Use quotes if wildcards are used

Default:

'*/*fastq*gz'

--threads <threads>#

number of threads per job (pigz)

Default:

4

-s, --use-sambamba#

use sambamba instad of samtools for sorting

--force#
--dry-run#
--slurm-queue <slurm_queue>#

mapping#

map FastQ data onto a reference

This is a simple mapper for quick test. The commands are as follows:

# Indexing bwa index REFERENCEsamtools faidx REFERENCE

# mapping bwa mem -t 4 -R @RGtID:1tSM:1tPL:illumina -T 30 REFERENCE FASTQ_FILES | samtools view -Sbh -> REFERENCE.bam

samtools sort -o REFERENCE.sorted.bam REFERENCE.bam

Usage

sequana mapping [OPTIONS]

Options

-1, --file1 <file1>#

Required R1 Fastq file ; zipped file expected

Default:

Sentinel.UNSET

-2, --file2 <file2>#

R2 Fastq file

Default:

Sentinel.UNSET

-r, --reference <reference>#

Required Reference where to map data

Default:

Sentinel.UNSET

-t, --threads <threads>#

number of threads to use

Default:

1

-p, --pacbio#
Default:

False

-s, --use-sambamba#

use sambamba instad of samtools for sorting

ribodesigner#

A tool to design custom ribodepletion probes.

This uses a reference genome (FASTA file) and the corresponding annotation (GFF file). CD-HIT-EST should be installed and in your $PATH.

Usage

sequana ribodesigner [OPTIONS] FASTA [GFF]

Options

--method <method>#
Options:

original | greedy | spiral | simple

--output-directory <output_directory>#
Default:

'out_ribodesigner'

--seq-type <seq_type>#

The annotation type (column 3 in gff) to target for probes.

Default:

'rRNA'

--max-n-probes <max_n_probes>#

The maximum number of probes to design.

Default:

384

--force-clustering#

By default, if number of probes is above 384 (see --max-n-probes) a clustering is performed using an identity threshold (with --identity-step). If not, no clustering is done. You may force the clustering using this option.

Default:

False

--threads <threads>#

The number of threads to use for cd-hit-est.

Default:

4

--identity-step <identity_step>#

The identity parameters used by cd-hit-est.

Default:

0.01

--output-image <output_image>#
--force#

If output directory exists, use this option to erase previous results

Default:

False

Arguments

FASTA#

Required argument

GFF#

Optional argument

rnadiff#

Sequana RNADiff: differential analysis and reporting.

The Sequana rnadiff command performs the differential analysis of input RNAseq data using DeSEQ2 behind the scene.

The command line looks like

sequana rnadiff --annotation Lepto.gff --design design.csv --features all_features.out --feature-name gene --attribute-name ID

This command performs the differential analysis of feature counts using DESeq2. A HTML report is created as well as a set of output files, including summary tables of the analysis.

The expected input is a tabulated file which is the aggregation of feature counts for each sample. This file is produced by the Sequana RNA-seq pipeline (sequana/rnaseq).

It is named all_features.out and looks like:

Geneid Chr Start End Strand Length BAM1 BAM2 BAM3 BAM4 ENSG0001 1 1 10 + 10 120 130 140 150 ENSG0002 2 1 10 + 10 120 130 0 0

To perform this analysis, you will also need the GFF file used during the RNA-seq analysis.

You also need a design file that give the correspondance between the sample names found in the feature_count file above and the conditions of your RNA-seq analysis. The design looks like:

label,condition BAM1,condition_A BAM2,condition_A BAM3,condition_B BAM4,condition_B

The feature-name is the feature that was used in your counting. The attribute-name is the main attribute to use in the HTML reports. Note however, that all attributes found in your GFF file are repored in the HTML page

Batch effet can be included by adding a column in the design.csv file. For example if called 'day', you can take this information into account using '--batch day'

By default, when comparing conditions, all combination are computed. If you have N conditions, we compute the N(N-1)/2 comparisons. The reference is automatically chosen as the last one found in the design file. In this example:

label,condition BAM1,A BAM2,A BAM3,B BAM4,B

we compare A versus B. If you do not want that behaviour, use '--reference A'.

In a more complex design,

label,condition BAM1,A BAM2,A BAM3,B BAM4,B BAM5,C BAM6,C

The comparisons are A vs B, A vs C and B vs C. If you wish to perform different comparisons or restrict the combination, you can use a comparison input file. For instance, to perform the C vs A and C vs B comparisons only, create this file (e.g. comparison.csv):

alternative,reference C,A C,B

and use '--comparison comparison.csv'.

Usage

sequana rnadiff [OPTIONS]

Options

--design <design>#

Required The design file in CSV format (see documentation above)

--features <features>#

Required The merged features counts. Output of the sequana_rnaseq pipeline

--annotation-file <annotation>#

The annotation GFF file used to perform the feature count

--beta-prior, --no-beta-prior#

Use beta prior or not. Default is no beta prior

--condition <condition>#

The name of the column in design.csv to use as condition for the differential analysis. Default is 'condition'

--force, --no-force#

If output directory exists, use this option to erase previous results

--output-directory <output_directory>#

Output directory where are saved the results. Use --force if it exists already

--feature-name <feature_name>#

Required The feature name compatible with your GFF (default is 'gene')

--attribute-name <attribute_name>#

Required The attribute used as identifier. Compatible with your GFF (default is 'ID')

--reference <reference>#

The reference to test DGE against. If provided, conditions not involving the reference are ignored. Otherwise all combinations are tested

--comparisons <comparisons>#

By default, if a reference is provided, all conditions versus that reference are tested. If no reference, the entire combinatory is performed (Ncondition * (Ncondition-1) / 2. In both case all condtions found in the design file are used. If a comparison file is provided, only conditions found in it will be used.

--cooks-cutoff <cooks_cutoff>#

if none, let DESeq2 choose the cutoff. Note that the Cook’s distance is set to NA for genes with values above the threshold. At least 3 replicates are required for flagging).

--independent-filtering, --no-independent-filtering#

Do not perform independent_filtering by default. low counts may not have adjusted pvalues otherwise

--batch <batch>#

set the column name (in your design) corresponding to the batch effect to be included in the statistical model as batch ~ condition

--fit-type <fit_type>#

DESeq2 type of fit. Default is 'parametric'. Uing the mean of gene-wise disperion estimates as the fitted value can be specified setting this argument to 'mean'.

--minimum-mean-reads-per-gene <minimum_mean_reads_per_gene>#

Keeps genes that have an average number of reads greater or equal to this value. This is the average across all replicates and conditions. Not recommended if you have lots of conditions. By default all genes are kept

--minimum-mean-reads-per-condition-per-gene <minimum_mean_reads_per_condition_per_gene>#

Keep genes that have at least one condition where the average number of reads is greater or equal to this value. By default all genes are kept

--model <model>#

By default, the model is ~batch + condition. For more complex cases, you may set the --model more specifically

--shrinkage, --no-shrinkage#

Shrinkage was added in the DESeq2 script analysis in Sequana 0.14.7. Although it has a marginal impact, number of DGEs may be different and volcano plots have usually a different shape. To ignore the shrinkage, you could set the option to --no-shrinkage

--keep-all-conditions, --no-keep-all-conditions#

Even though sub set of comparisons are provided, keep all conditions in the analysis and report only the provided comparisons

--hover-name <hover_name>#

In volcano plot, we set the hover name to Name if present in the GFF, otherwise to gene_id if present, then locus_tag, and finally ID and gene_name. One can specify a hover name to be used with this option

--report-only#

If analysis was done, you may want to redo the HTML report only using this option

--split-full-table, --no-split-full-table#

Multiple comparisons on large genomes may create HTML reports that are quite large and would required lots of memory. Using this option, only significative DGE are in the main HTML report and full table are save in individual HTML pages

--xticks-fontsize <xticks_fontsize>#

Reduce fontsize of xticks

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

rnaseq-compare#

Compare 2 tables created by the 'sequana rnadiff' command

Usage

sequana rnaseq-compare [OPTIONS]

Options

--file1 <file1>#

Required The first input RNA-seq table to compare

--file2 <file2>#

Required The second input RNA-seq table to compare

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

salmon-cli#

Convert output of Salmon into a feature counts file

Usage

sequana salmon-cli [OPTIONS]

Options

-i, --input <input>#

Required The salmon input file.

-o, --output <output>#

Required The feature counts output file

-f, --gff <gff>#

Required A GFF file compatible with your salmon file

-a, --attribute <attribute>#

A valid attribute to be found in the GFF file and salmon input

-F, --feature <feature>#

A valid feature

samplesheet#

Standalone application to validate/check Illumina sample sheet

Usage

sequana samplesheet [OPTIONS] NAME

Options

--check#

report validity of the input file

--full-check#

report complete report of all checks

--extract-adapters#

extract adapters from the settings section and save them into a fasta file

--quick-fix#
--output <output>#

Arguments

NAME#

Required argument

somy-score#

Somy score on polyploid (or not)

Usage

sequana somy-score [OPTIONS] FILENAME

Options

--window-size <window_size>#

The reference to test DGE against. If provided, conditions not involving the reference are ignored. Otherwise all combinations are tested

Default:

1000

--fast#

fast option

--method <method>#

Method to estimate the main somy (in principle diploid). Median and mean methods simply compute those statistics from the chunks (windows). The EM is more complex, and compute distribution, estimate mixture model and therefore the mean of the diploid distribution based on those estimates

Default:

'median'

Options:

em | median | mean

--estimated-diploy-coverage <estimated_diploy_coverage>#

If not provided, data normalisation is based on --method. If provided, this is the estimated coverage

--chromosomes <chromosomes>#

list of chromosomes to restrict to

--mapq <mapq>#

list of chromosomes to restrict to

Default:

0

--telomeric-span <telomeric_span>#

region to ignore in kb. This suppose that contigs are circularised correctly. If not, set to 0

Default:

10

-k <k>#

Model for gaussin mixture (k=4 is suppose to capture di + tri + tetraploidy)

Default:

4

--minimum-depth <minimum_depth>#

coverage with depth below this value are removed.

--flag <flag>#

3844 means that it removes unmapped reads, but also secondary and supplementary reads.

Default:

3844

--threads <threads>#

use 4 threads .

--exclude-chromosomes <exclude_chromosomes>#

list of chromosomes to exclude

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

Arguments

FILENAME#

Required argument

summary#

Create a HTML report for various types of NGS formats.

Supported modules:

  • bamqc

  • fastq

This processes all files in the given pattern (in back-quotes) sequentially and produces one HTML file per input.

Other modules work the same way. For example, for FastQ files:

sequana summary one_input.fastq
sequana summary `ls *fastq`

Export to JSON:

sequana summary input.fastq --output-json stats.json

Usage

sequana summary [OPTIONS] [NAME]...

Options

--module <module>#
Options:

bamqc | bam | fasta | fastq | gff | vcf | sam

--output-file <output_file>#
--output-json <output_json>#

Export stats to JSON file

Arguments

NAME#

Optional argument(s)

taxonomy#

Tool to retrieve taxonomic information.

sequana taxonomy --search-kegg leptospira

Usage

sequana taxonomy [OPTIONS]

Options

--search-kegg <search_kegg>#

Search a pattern amongst all KEGG organisms

--search-panther <search_panther>#

Search a pattern amongst all Panther organism

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

telomark#

Scan a FASTA file for telomeric repeats and report their positions.

Usage

sequana telomark [OPTIONS] FASTA_FILE

Options

--chromosomes <chromosomes>#

list of chromosomes to restrict to

--tag <tag>#
--chunk-size <chunk_size>#

chunk at beginning and end to look at

--peak-height <peak_height>#

chunk at beginning and end to look at

--peak-width <peak_width>#

chunk at beginning and end to look at

--plot-style <plot_style>#

Per-contig plot style. 'annotated' overlays both strand signals with color-coded telomeric regions and a status badge; 'legacy' shows two separate raw subplots (original behaviour).

Default:

'annotated'

Options:

annotated | legacy

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR

Arguments

FASTA_FILE#

Required argument

variants-comparison#

Retrieve difference in variant content across multiple samples using joint calling vcf file.

Usage

sequana variants-comparison [OPTIONS]

Options

-i, --input-vcf <VCF>#

Required Joint calling vcf using freebayes and annotated with snpEff.

-g, --input-gff <GFF>#

Required GFF used to annotate the VCF file.

-o, --output-html <HTML>#

Required HTML report.

-t, --title <TITLE>#

Report title.

-q, --quality-threshold <QUAL>#

Threshold used to filter variants and keep only variants that are greater than the argument value.

Default:

1

-r, --remove-sample <SAMPLE>#

Sample name you want to remove from cross comparisons.

-s, --ordered-sample <SAMPLE>#

Comma separated samples name order you want.

--logger <logger>#
Options:

INFO | DEBUG | WARNING | CRITICAL | ERROR