CLI reference#
The top-level sequana command groups ~30 sub-commands (FASTQ/FASTA
utilities, GFF/GTF fixers, enrichment helpers, summaries, …). The reference
below is generated directly from the source.
Each sub-command also accepts --help from the shell, e.g.:
sequana fastq --help
sequana enrichment-kegg --help
sequana#
This is the main entry point for a set of Sequana applications.
Pipelines such as sequana_rnaseq, sequana_variant_calling have their own application and help.
In addition, more advanced tools such as sequana_taxonomy or sequana_coverage have their own standalone.
To setup completion, type this command depending on your shell (bash):
eval "$(_SEQUANA_COMPLETE=bash_source sequana)" eval "$(_SEQUANA_COMPLETE=source_zsh sequana)" eval (env _SEQUANA_COMPLETE=source_fish sequana)
Usage
sequana [OPTIONS] COMMAND [ARGS]...
Options
- --version#
Show the version and exit.
biomart#
Retrieve information from biomart and save into CSV file
This command uses BioMart from BioServices to introspect a MART service (--mart) and a specific dataset (default to mmusculus_gene_ensembl). Then, for all ensembl IDs, it will fetch the requested attributes (--attributes). Finally, it saves the CSV file into an output file (--output). This takes about 5-10 minutes to retrieve the data depending on the connection.
Example:
sequana biomart --mart mmusculus_gene_ensembl --mart ENSEMBL_MART_ENSEMBL --dataset mmusculus_gene_ensembl --attributes ensembl_gene_id,external_gene_name,go_id --output test.csv
Usage
sequana biomart [OPTIONS]
Options
- --mart <mart>#
A valid mart name
- Default:
'ENSEMBL_MART_ENSEMBL'
- --host <host>#
A valid mart name such as www.ensembl.org (default) or fungi.ensembl.org
- --dataset <dataset>#
Required A valid dataset name. e.g. mmusculus_gene_ensembl, hsapiens_gene_ensembl
- --attributes <attributes>#
A valid set of attributes to look for in the dataset. Multiple attributes are separeted by a comma (no spaces accepted)
- Default:
'ensembl_gene_id,go_id,entrezgene_id,external_gene_name'
- --output <output>#
by default save results into a CSV file named biomart_<dataset>_<YEAR>_<MONTH>_<DAY>.csv
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
blast-to-gff#
Convert a Blast file (outfmt=6)
Usage
sequana blast-to-gff [OPTIONS] INPUT_BLAST OUTPUT_GFF
Options
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
Arguments
- INPUT_BLAST#
Required argument
- OUTPUT_GFF#
Required argument
embl-to-fasta#
Convert a emble to fasta
Usage
sequana embl-to-fasta [OPTIONS] INPUT_EMBL OUTPUT_FASTA
Options
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
Arguments
- INPUT_EMBL#
Required argument
- OUTPUT_FASTA#
Required argument
enrichment-kegg#
Create a HTML report showing KEGG enriched pathways
Example for the enrichment module:
sequana enrichment-kegg rnadiff_output_dir --log2-foldchange-cutoff 2
The KEGG pathways are loaded and it may take time. Once done, they are saved in kegg_pathways/organism and be loaded next time:
- sequana enrichment-kegg rnadiff_output_dir --log2-foldchange-cutoff 2
--kegg-name lbi --annotation-attribute gene_name
Usage
sequana enrichment-kegg [OPTIONS] RNADIFF_DIRECTORY
Options
- --annotation-attribute <annotation_attribute>#
a valid attribute to be used to map on KEGG database
- Default:
'Name'
- --kegg-name <kegg_name>#
Required a valid KEGG name (hsa for human, mmu for mus musculus); See the taxonomy command to retrieve other names
- --log2-foldchange-cutoff <log2_foldchange_cutoff>#
remove events with absolute log2 fold change below this value
- Default:
1
- --padj-cutoff <padj_cutoff>#
remove events with pvalue above this value default (0.05).
- Default:
0.05
- --biomart <biomart>#
you may need a biomart mapping of your identifier for the kegg pathways analysis. If you do not have this file, you can use 'sequana biomart' command
- --plot-linearx#
Default is log2 fold enrichment in the plots. use this to use linear scale
- --kegg-pathways-directory <kegg_pathways_directory>#
a place where to find the pathways for each organism
- --comparison <comparison>#
By default analyses all comparisons found in input file. You may set one specifically with this argument.
- --max-pathways <max_pathways>#
Max number of pathways to show (most enriched)
- Default:
40
- --kegg-background <kegg_background>#
a background for kegg enrichment. If None, set to the number of genes used in the differential analysis (input file rnadiff.csv).
- --condition <condition>#
The name of the column used in the design file to define groups.
- --output-directory <output_directory>#
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
Arguments
- RNADIFF_DIRECTORY#
Required argument
enrichment-panther#
Create a HTML report for various sequana outputs.
Input: the enrichment output of the RNADiff pipeline.
Example:
sequana enrichment-panther rnadiff.csv --panther-taxon 10090 \
--log2-foldchange-cutoff 2
sequana enrichment rnadiff/rnadiff.csv \
--panther-taxon 189518 \
--log2-foldchange-cutoff 2 \
--ontologies MF SLIM_MF
Valid ontologies are: MF, BP, CC, SLIM_MF, SLIM_BP, SLIM_CC, PROTEIN, PANTHER_PATHWAY, REACTOME_PATHWAY.
Usage
sequana enrichment-panther [OPTIONS] RNADIFF_DIRECTORY
Options
- --annotation-attribute <annotation_attribute>#
a valid taxon identifiers
- Default:
'index'
- --panther-taxon <panther_taxon>#
Required a valid taxon identifiers
- --log2-foldchange-cutoff <log2_foldchange_cutoff>#
remove events with absolute log2 fold change below this value
- Default:
1
- --padj-cutoff <padj_cutoff>#
remove events with pvalue abobe this value default (0.05).
- Default:
0.05
- --plot-linearx#
Default is log2 fold enrichment in the plots. use this to use linear scale
- Default:
False
- --compute-levels, --no-compute-levels#
Compute the levels of each go term, set --no-compute-levels to skip this step
- --max-genes <max_genes>#
Maximum number of genes (up or down) to use in PantherDB.
- Default:
2500
- --ontologies <ontologies>#
Provide the ontologies to be included in the analysis and HTML report. Valid choices are: from MF, BP, CC, SLIM_MF, SLIM_BP, SLIM_CC, PROTEIN, PANTHER_PATHWAY, REACTOME_PATHWAY
- Default:
'MF', 'BP', 'CC'
- --max-enriched-go-terms <max_enriched_go_terms>#
Max number of enriched go terms to show in the plots (most enriched). All enriched GO terms are stored in tables
- Default:
40
- --condition <condition>#
The name of the column used in the design file to define groups.
- --output-directory <output_directory>#
- Default:
'enrichment_panther'
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
Arguments
- RNADIFF_DIRECTORY#
Required argument
enrichment-uniprot#
GO enrichment using Uniprot and HTML report creation
Example:
sequana enrichment-uniprot rnadiff.csv --taxon 10090
--log2-foldchange-cutoff 2
sequana enrichment rnadiff/rnadiff.csv
--taxon 189518 --log2-foldchange-cutoff 2
--ontologies MF
Usage
sequana enrichment-uniprot [OPTIONS] RNADIFF_DIRECTORY
Options
- --annotation-attribute <annotation_attribute>#
a valid taxon identifiers
- Default:
'index'
- --taxon <taxon>#
Required a valid taxon identifiers
- --log2-foldchange-cutoff <log2_foldchange_cutoff>#
remove events with absolute log2 fold change below this value
- Default:
1
- --padj-cutoff <padj_cutoff>#
remove events with pvalue abobe this value default (0.05).
- Default:
0.05
- --plot-linearx#
Default is log2 fold enrichment in the plots. use this to use linear scale
- Default:
False
- --compute-levels, --no-compute-levels#
Compute the levels of each go term, set --no-compute-levels to skip this step
- --max-genes <max_genes>#
Maximum number of genes (up or down) to use in PantherDB.
- Default:
2500
- --ontologies <ontologies>#
Provide the ontologies to be included in the analysis and HTML report. Valid choices are: from MF, BP, CC
- Default:
'MF', 'BP', 'CC'
- --max-enriched-go-terms <max_enriched_go_terms>#
Max number of enriched go terms to show in the plots (most enriched). All enriched GO terms are stored in tables
- Default:
40
- --condition <condition>#
The name of the column used in the design file to define groups.
- --output-directory <output_directory>#
- Default:
'enrichment_uniprot'
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
Arguments
- RNADIFF_DIRECTORY#
Required argument
fasta#
Set of useful utilities for FastA manipulation.
Usage
sequana fasta [OPTIONS] FILENAME...
Options
- -o, --output <output>#
filename where to save results. to be used with --merge;, --save-contig-name
- --count-sequences#
prints number of sequences
- --to-chrom-size#
extract name and size and print on stdout, or save in a file (--output)
- --merge#
merge all compressed input fasta files into a single file
- --save-contig-name <save_contig_name>#
save sequence corresponding to this contig name
- --explode#
Create a fasta file for each sequence found in the original files
- --extract <extract>#
Extra one sequence from a fasta and save in new file
- --reverse-complement#
Create a fasta file with reversed complement sequence
Arguments
- FILENAME#
Required argument(s)
fastq#
Set of useful utilities for FastQ manipulation.
Input file can be gzipped or not. The --output-file
Usage
sequana fastq [OPTIONS] [FILENAME]...
Options
- -o, --output <output>#
filename where to save results. to be used with --head, --tail
- --count-reads#
- --head <head>#
number of reads to extract from the head
- --merge#
merge all compressed input fastq files into a single file
- --tail <tail>#
number of reads to extract from the tail
Arguments
- FILENAME#
Optional argument(s)
fastq-split#
Split a FASTQ file into smaller parts (by number of reads or by number of parts).
Examples:
sequana fastq-split reads.fastq --by-size 1000000 --pattern chunk.#.fastq.gz
sequana fastq-split reads.fastq.gz --by-part 10 --pattern sample.#.fastq.gz
Usage
sequana fastq-split [OPTIONS] INPUT_FASTQ
Options
- --by-size <by_size>#
Split file into chunks of N reads each.
- --by-part <by_part>#
Split file into N equal parts.
- --pattern <pattern>#
Output filename pattern (use # as placeholder for chunk number).
- Default:
'output.#.fastq.gz'
- --gzip#
Compress output with gzip.
- --buffer-size <buffer_size>#
Number of reads to buffer before writing to disk.
- Default:
10000
Arguments
- INPUT_FASTQ#
Required argument
feature-counts#
Merge several feature counts files into one file
Usage
sequana feature-counts [OPTIONS]
Options
- --pattern <pattern>#
The pattern of the feature counts files to merge
- Default:
'*feature.out'
- --output <output>#
The output filename where to save the merged counts
- Default:
'all_features.out'
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
find-integrated-genes#
Find overlapping reads on host and possible integrated genes.
This script scans a BAM file and identifies reads that map onto a gene name provided by the user. Then, it checks whether these reads also map onto the other gene names.
This is very convenient to search for an integrated plasmid or gene in a host. For instance a transgene in a mammal genome.
The candidate reads are saved in various files:
a NAME_info.csv file with information about reads that were found of interest. The header is M,S,H,I,D,flag,chr,position,identifier. M, S, H, I and D indicates the number of bases that map (M), are soft clipped (S), hard clipped (H), inserted (I), or deleted (D). We also add the flag (type of mapping), the chromosome and position (chr, position) and finally the read identifier.
a NAME_reads_IDs.csv that contains the unique identifier reads
a FastQ if --save-reads-as-fastq option is used
a FastA if --save-reads-as-fasta option is used
Note that secondary reads are ignored (flag 256, 272).
We also removed reads that map at a single place (no overlap).
To obtain the BAM files, you can use the sequana_mapper pipeline (sequana/mapper).
The BAM file should be indexed with samtools for better performances.
Usage
sequana find-integrated-genes [OPTIONS]
Options
- --bam-file <bam_file>#
Required The BAM file to introspect
- --name <name>#
Required The name of the gene that is suppose to be integrated in the host.
- --tag <tag>#
By default all output files contain the name provided with --name. If you wish to override this behaviour you can use the --tag option. FastQ and FastA files will be named tag.fastq and tag.fasta
- --save-reads#
If provided, save reads of interest in a FastQ file named {name}.fastq and FastA file named {name}.fasta
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
g4hunter#
Based on G4Hunter
takes into account G-richness and G-skewness of a given sequence and gives a quadruplexpropensity score as output.')
Usage
sequana g4hunter [OPTIONS]
Options
- -i, --input <infile>#
input FASTA file
- -o, --output <outdir>#
output directory
- --window <window>#
- Default:
20
- --score <score>#
- Default:
1
gff#
Set of useful utilities for GFF manipulation.
Usage
sequana gff [OPTIONS] [FILENAME]...
Options
- -o, --output <output>#
filename where to save results.
- --add-CDS-and-mRNA#
add CDS and mRNA
- --gene-id <gene_id>#
Uses this identifier to get gene names to build new DS and mRNA identifiers
Arguments
- FILENAME#
Optional argument(s)
gff-to-gtf#
Convert a GFF file into GTF
This is experimental convertion. Use with care.
Usage
sequana gff-to-gtf [OPTIONS] GFF_FILENAME
Options
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
Arguments
- GFF_FILENAME#
Required argument
gff-to-light-gff#
Extract the feature of interest in the input GFF to create a light version
sequana gff-to-light-gff input.gff output.gff --features gene,exon
Usage
sequana gff-to-light-gff [OPTIONS] INPUT OUTPUT
Options
- --features <features>#
list of features to be extracted
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
Arguments
- INPUT#
Required argument
- OUTPUT#
Required argument
gtf-fixer#
Reads GTF and fix known issues (exon and genes uniqueness)
Usage
sequana gtf-fixer [OPTIONS]
Options
- -i, --input <input>#
Required
- -o, --output <output>#
Required
html-report#
Create a HTML report for various type of data set
# VCF
The VCF module takes an input VCF file and filter it before creating an HTML report. The options are related to the variant quality. We assume that the VCF file was created with sequana_variant_calling pipeline and so relied on freebayes software. If so, you can provide a freebayes minimal score, remove variants below a given frequency (--frequency), etc (please see --help for other parameters
Usage
sequana html-report [OPTIONS] NAME
Options
- --output-directory <output_directory>#
- Default:
'.'
- --output-vcf-file <output_vcf_file>#
Path to output VCF file.
- Default:
'sequana.filter.vcf'
- --output-csv-file <output_csv_file>#
Path to output CSV file.
- Default:
'sequana.filter.csv'
- --freebayes-score <freebayes_score>#
Freebayes score threshold.
- Default:
20
- --strand-ratio <strand_ratio>#
Minimum strand ratio.
- Default:
0.2
- --frequency <frequency>#
Minimum allele frequency.
- Default:
0.1
- --min-depth <min_depth>#
- Default:
10
- --forward-depth <forward_depth>#
- Default:
3
- --reverse-depth <reverse_depth>#
- Default:
3
- --keep-polymorphic <keep_polymorphic>#
- Default:
True
Arguments
- NAME#
Required argument
lane-merging#
Merge lanes.
Looks for data stored either as:
<sampleID_1>/*fastq.gz
<sampleID_2>/*fastq.gz
<sampleID_3>/*fastq.gz
or as:
sampleID_L001_.fastq.gz
sampleID_L002_.fastq.gz
Usage
sequana lane-merging [OPTIONS]
Options
- --lanes <lanes>#
Required
- -o, --output-directory <output_directory>#
Required Where to store the new fastq files
- Default:
'merging'
- --pattern <pattern>#
pattern for the input fastq files. Use quotes if wildcards are used
- Default:
'*/*fastq*gz'
- --threads <threads>#
number of threads per job (pigz)
- Default:
4
- -s, --use-sambamba#
use sambamba instad of samtools for sorting
- --force#
- --dry-run#
- --slurm-queue <slurm_queue>#
mapping#
map FastQ data onto a reference
This is a simple mapper for quick test. The commands are as follows:
# Indexing bwa index REFERENCEsamtools faidx REFERENCE
# mapping bwa mem -t 4 -R @RGtID:1tSM:1tPL:illumina -T 30 REFERENCE FASTQ_FILES | samtools view -Sbh -> REFERENCE.bam
samtools sort -o REFERENCE.sorted.bam REFERENCE.bam
Usage
sequana mapping [OPTIONS]
Options
- -1, --file1 <file1>#
Required R1 Fastq file ; zipped file expected
- Default:
Sentinel.UNSET
- -2, --file2 <file2>#
R2 Fastq file
- Default:
Sentinel.UNSET
- -r, --reference <reference>#
Required Reference where to map data
- Default:
Sentinel.UNSET
- -t, --threads <threads>#
number of threads to use
- Default:
1
- -p, --pacbio#
- Default:
False
- -s, --use-sambamba#
use sambamba instad of samtools for sorting
ribodesigner#
A tool to design custom ribodepletion probes.
This uses a reference genome (FASTA file) and the corresponding annotation (GFF file). CD-HIT-EST should be installed and in your $PATH.
Usage
sequana ribodesigner [OPTIONS] FASTA [GFF]
Options
- --method <method>#
- Options:
original | greedy | spiral | simple
- --output-directory <output_directory>#
- Default:
'out_ribodesigner'
- --seq-type <seq_type>#
The annotation type (column 3 in gff) to target for probes.
- Default:
'rRNA'
- --max-n-probes <max_n_probes>#
The maximum number of probes to design.
- Default:
384
- --force-clustering#
By default, if number of probes is above 384 (see --max-n-probes) a clustering is performed using an identity threshold (with --identity-step). If not, no clustering is done. You may force the clustering using this option.
- Default:
False
- --threads <threads>#
The number of threads to use for cd-hit-est.
- Default:
4
- --identity-step <identity_step>#
The identity parameters used by cd-hit-est.
- Default:
0.01
- --output-image <output_image>#
- --force#
If output directory exists, use this option to erase previous results
- Default:
False
Arguments
- FASTA#
Required argument
- GFF#
Optional argument
rnadiff#
Sequana RNADiff: differential analysis and reporting.
The Sequana rnadiff command performs the differential analysis of input RNAseq data using DeSEQ2 behind the scene.
The command line looks like
sequana rnadiff --annotation Lepto.gff --design design.csv --features all_features.out --feature-name gene --attribute-name ID
This command performs the differential analysis of feature counts using DESeq2. A HTML report is created as well as a set of output files, including summary tables of the analysis.
The expected input is a tabulated file which is the aggregation of feature counts for each sample. This file is produced by the Sequana RNA-seq pipeline (sequana/rnaseq).
It is named all_features.out and looks like:
Geneid Chr Start End Strand Length BAM1 BAM2 BAM3 BAM4 ENSG0001 1 1 10 + 10 120 130 140 150 ENSG0002 2 1 10 + 10 120 130 0 0
To perform this analysis, you will also need the GFF file used during the RNA-seq analysis.
You also need a design file that give the correspondance between the sample names found in the feature_count file above and the conditions of your RNA-seq analysis. The design looks like:
label,condition BAM1,condition_A BAM2,condition_A BAM3,condition_B BAM4,condition_B
The feature-name is the feature that was used in your counting. The attribute-name is the main attribute to use in the HTML reports. Note however, that all attributes found in your GFF file are repored in the HTML page
Batch effet can be included by adding a column in the design.csv file. For example if called 'day', you can take this information into account using '--batch day'
By default, when comparing conditions, all combination are computed. If you have N conditions, we compute the N(N-1)/2 comparisons. The reference is automatically chosen as the last one found in the design file. In this example:
label,condition BAM1,A BAM2,A BAM3,B BAM4,B
we compare A versus B. If you do not want that behaviour, use '--reference A'.
In a more complex design,
label,condition BAM1,A BAM2,A BAM3,B BAM4,B BAM5,C BAM6,C
The comparisons are A vs B, A vs C and B vs C. If you wish to perform different comparisons or restrict the combination, you can use a comparison input file. For instance, to perform the C vs A and C vs B comparisons only, create this file (e.g. comparison.csv):
alternative,reference C,A C,B
and use '--comparison comparison.csv'.
Usage
sequana rnadiff [OPTIONS]
Options
- --design <design>#
Required The design file in CSV format (see documentation above)
- --features <features>#
Required The merged features counts. Output of the sequana_rnaseq pipeline
- --annotation-file <annotation>#
The annotation GFF file used to perform the feature count
- --beta-prior, --no-beta-prior#
Use beta prior or not. Default is no beta prior
- --condition <condition>#
The name of the column in design.csv to use as condition for the differential analysis. Default is 'condition'
- --force, --no-force#
If output directory exists, use this option to erase previous results
- --output-directory <output_directory>#
Output directory where are saved the results. Use --force if it exists already
- --feature-name <feature_name>#
Required The feature name compatible with your GFF (default is 'gene')
- --attribute-name <attribute_name>#
Required The attribute used as identifier. Compatible with your GFF (default is 'ID')
- --reference <reference>#
The reference to test DGE against. If provided, conditions not involving the reference are ignored. Otherwise all combinations are tested
- --comparisons <comparisons>#
By default, if a reference is provided, all conditions versus that reference are tested. If no reference, the entire combinatory is performed (Ncondition * (Ncondition-1) / 2. In both case all condtions found in the design file are used. If a comparison file is provided, only conditions found in it will be used.
- --cooks-cutoff <cooks_cutoff>#
if none, let DESeq2 choose the cutoff. Note that the Cook’s distance is set to NA for genes with values above the threshold. At least 3 replicates are required for flagging).
- --independent-filtering, --no-independent-filtering#
Do not perform independent_filtering by default. low counts may not have adjusted pvalues otherwise
- --batch <batch>#
set the column name (in your design) corresponding to the batch effect to be included in the statistical model as batch ~ condition
- --fit-type <fit_type>#
DESeq2 type of fit. Default is 'parametric'. Uing the mean of gene-wise disperion estimates as the fitted value can be specified setting this argument to 'mean'.
- --minimum-mean-reads-per-gene <minimum_mean_reads_per_gene>#
Keeps genes that have an average number of reads greater or equal to this value. This is the average across all replicates and conditions. Not recommended if you have lots of conditions. By default all genes are kept
- --minimum-mean-reads-per-condition-per-gene <minimum_mean_reads_per_condition_per_gene>#
Keep genes that have at least one condition where the average number of reads is greater or equal to this value. By default all genes are kept
- --model <model>#
By default, the model is ~batch + condition. For more complex cases, you may set the --model more specifically
- --shrinkage, --no-shrinkage#
Shrinkage was added in the DESeq2 script analysis in Sequana 0.14.7. Although it has a marginal impact, number of DGEs may be different and volcano plots have usually a different shape. To ignore the shrinkage, you could set the option to --no-shrinkage
- --keep-all-conditions, --no-keep-all-conditions#
Even though sub set of comparisons are provided, keep all conditions in the analysis and report only the provided comparisons
- --hover-name <hover_name>#
In volcano plot, we set the hover name to Name if present in the GFF, otherwise to gene_id if present, then locus_tag, and finally ID and gene_name. One can specify a hover name to be used with this option
- --report-only#
If analysis was done, you may want to redo the HTML report only using this option
- --split-full-table, --no-split-full-table#
Multiple comparisons on large genomes may create HTML reports that are quite large and would required lots of memory. Using this option, only significative DGE are in the main HTML report and full table are save in individual HTML pages
- --xticks-fontsize <xticks_fontsize>#
Reduce fontsize of xticks
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
rnaseq-compare#
Compare 2 tables created by the 'sequana rnadiff' command
Usage
sequana rnaseq-compare [OPTIONS]
Options
- --file1 <file1>#
Required The first input RNA-seq table to compare
- --file2 <file2>#
Required The second input RNA-seq table to compare
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
salmon-cli#
Convert output of Salmon into a feature counts file
Usage
sequana salmon-cli [OPTIONS]
Options
- -i, --input <input>#
Required The salmon input file.
- -o, --output <output>#
Required The feature counts output file
- -f, --gff <gff>#
Required A GFF file compatible with your salmon file
- -a, --attribute <attribute>#
A valid attribute to be found in the GFF file and salmon input
- -F, --feature <feature>#
A valid feature
samplesheet#
Standalone application to validate/check Illumina sample sheet
Usage
sequana samplesheet [OPTIONS] NAME
Options
- --check#
report validity of the input file
- --full-check#
report complete report of all checks
- --extract-adapters#
extract adapters from the settings section and save them into a fasta file
- --quick-fix#
- --output <output>#
Arguments
- NAME#
Required argument
somy-score#
Somy score on polyploid (or not)
Usage
sequana somy-score [OPTIONS] FILENAME
Options
- --window-size <window_size>#
The reference to test DGE against. If provided, conditions not involving the reference are ignored. Otherwise all combinations are tested
- Default:
1000
- --fast#
fast option
- --method <method>#
Method to estimate the main somy (in principle diploid). Median and mean methods simply compute those statistics from the chunks (windows). The EM is more complex, and compute distribution, estimate mixture model and therefore the mean of the diploid distribution based on those estimates
- Default:
'median'- Options:
em | median | mean
- --estimated-diploy-coverage <estimated_diploy_coverage>#
If not provided, data normalisation is based on --method. If provided, this is the estimated coverage
- --chromosomes <chromosomes>#
list of chromosomes to restrict to
- --mapq <mapq>#
list of chromosomes to restrict to
- Default:
0
- --telomeric-span <telomeric_span>#
region to ignore in kb. This suppose that contigs are circularised correctly. If not, set to 0
- Default:
10
- -k <k>#
Model for gaussin mixture (k=4 is suppose to capture di + tri + tetraploidy)
- Default:
4
- --minimum-depth <minimum_depth>#
coverage with depth below this value are removed.
- --flag <flag>#
3844 means that it removes unmapped reads, but also secondary and supplementary reads.
- Default:
3844
- --threads <threads>#
use 4 threads .
- --exclude-chromosomes <exclude_chromosomes>#
list of chromosomes to exclude
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
Arguments
- FILENAME#
Required argument
summary#
Create a HTML report for various types of NGS formats.
Supported modules:
bamqc
fastq
This processes all files in the given pattern (in back-quotes) sequentially and produces one HTML file per input.
Other modules work the same way. For example, for FastQ files:
sequana summary one_input.fastq
sequana summary `ls *fastq`
Export to JSON:
sequana summary input.fastq --output-json stats.json
Usage
sequana summary [OPTIONS] [NAME]...
Options
- --module <module>#
- Options:
bamqc | bam | fasta | fastq | gff | vcf | sam
- --output-file <output_file>#
- --output-json <output_json>#
Export stats to JSON file
Arguments
- NAME#
Optional argument(s)
taxonomy#
Tool to retrieve taxonomic information.
sequana taxonomy --search-kegg leptospira
Usage
sequana taxonomy [OPTIONS]
Options
- --search-kegg <search_kegg>#
Search a pattern amongst all KEGG organisms
- --search-panther <search_panther>#
Search a pattern amongst all Panther organism
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
telomark#
Scan a FASTA file for telomeric repeats and report their positions.
Usage
sequana telomark [OPTIONS] FASTA_FILE
Options
- --chromosomes <chromosomes>#
list of chromosomes to restrict to
- --tag <tag>#
- --chunk-size <chunk_size>#
chunk at beginning and end to look at
- --peak-height <peak_height>#
chunk at beginning and end to look at
- --peak-width <peak_width>#
chunk at beginning and end to look at
- --plot-style <plot_style>#
Per-contig plot style. 'annotated' overlays both strand signals with color-coded telomeric regions and a status badge; 'legacy' shows two separate raw subplots (original behaviour).
- Default:
'annotated'- Options:
annotated | legacy
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR
Arguments
- FASTA_FILE#
Required argument
variants-comparison#
Retrieve difference in variant content across multiple samples using joint calling vcf file.
Usage
sequana variants-comparison [OPTIONS]
Options
- -i, --input-vcf <VCF>#
Required Joint calling vcf using freebayes and annotated with snpEff.
- -g, --input-gff <GFF>#
Required GFF used to annotate the VCF file.
- -o, --output-html <HTML>#
Required HTML report.
- -t, --title <TITLE>#
Report title.
- -q, --quality-threshold <QUAL>#
Threshold used to filter variants and keep only variants that are greater than the argument value.
- Default:
1
- -r, --remove-sample <SAMPLE>#
Sample name you want to remove from cross comparisons.
- -s, --ordered-sample <SAMPLE>#
Comma separated samples name order you want.
- --logger <logger>#
- Options:
INFO | DEBUG | WARNING | CRITICAL | ERROR