sRNAtoolbox

Choose Reads File Input

There are 4 ways you can provide reads for sRNAbench to profile:

Upload a file (typically fastq or fastq.gz)

Provide a link/URL with your data (it will be downloaded and then profiled)

Provide an accession for a SRA run (they start with SRR,ERR or DRR e.g. SRR1563062)

Provide an ID from a previous sRNAbench Job to reuse an uploaded file (e.g. ZAPFTBMPBGCPIDM)

Input File Formats

Different types of input can be provided: fastq, sra, read/count and fasta format.
The files can be provided as plain text (fastq, read/count and fasta) or compressed with gzip (*.gz extension). Please try whenever possible upload *.gz files.
Please note that sRNAbench infers the file format by means of the extension. unknown extensions are treated as read/count format. For example sample.gz would be treated as read/count format file. The program will fail if the file format is incorrectly inferred.
The recognized extensions are:
fasta: fa, fasta, fa.gz, fasta.gz
read/count: rc, rc.gz
sra: sra
fastq: fastq.gz, fastq, fq, fq.gz, FASTQ, FASTQ.gz, fastQ, fastQ.gz

File format explanation:
read/count
read/count format is tab separated with two columns: the read sequence separated by the read count (number of times this read was sequenced)
read read count
ATTACG… 3000
GCATT… 2500

fasta format:
>readID#3000
ATTACG…
>readID#2500

Note that spaces between the readID and the read count are also allowed.

Choose Reference Species

One or several Species annotations can be chosen to use in the analysis.

A genome assembly must be provided to allow prediction of new microRNAs

If no species are selected, either miRBase/MirGeneDB/PmiREN short names or 'user libraries' need to be provided

Do not map to genome (Library mode)

The input reads are mapped against annotations in our database for the selected Species and against user-provided libraries (instead of first mapping to the genome).

Guess Adapter

sRNAbench will try to guess the adapter. Briefly, sRNAbench will align the first 250000 reads to the genome using the Bowtie seed function (
the adapters will not count for the mismatches). Then , the adapter
sequence is defined as the most frequent 10-mer starting at the first mismatch (default: guessAdapter=false)

Minimum Adapter Length

Reads can have both, a 5’ barcode and 3’ adapter sequences. For example, reads of 36 nt length, out of which 5 nt correspond to the barcode will have at the most 31 nt ‘useful’ information. In such a case, the default minimum adapter length cannot be used as this would imply that only small RNAs equal or shorter than 31nt -10nt = 21nt can be profiled.
Therefore, in such a case, the minimum adapter length should be set to 6nt allowing the profiling of small RNAs up to 25 nt. Moreover, the allowed max. number of mismatches in adapter detection should be set to 0 as otherwise the false positive detection of the adapters will increase notably (given the short sequence of only 6nt that has a much higher probability to occur by chance alone.

Number of Mismatches in Adapter

Permitting more mismatches between the read and the adapter sequence will allow to detect and trim a higher number of adapter sequences, but will also increase the number of false positive trimmings (especially if the minimum adapter length is decreased!).

Quality filtering methods

There are two methods to filter out reads of low quality:

mean-based: a read is accepted if the mean phred score is above a set threshold;

minimum per nucleotide: first N (0 by default) bases with phred score below a set threshold are allowed, all nucleotides after this are trimmed off. N can be defined by the user.

Phred Encoding

Each platform supports a different encoding format:

Sanger, Phred+33

Illumina 1.3+-1.5+, Phred+64

Illumina 1.8+, Phred+33

For more info we recommend having a look here

Choose Reads File Input

This selection overwrites the species previously selected. Short names introduced must refer to miRBase annotation. For example:

mmu (for mouse)

hsa (for human)

hsa:hsv1 (for human and herpes simples virus)

Do not profile other ncRNAs

Only microRNAs are profiled. This option basically lowers the run-time notably. To reduce run-time further, the prediction of novel microRNAs can be deactivated (see Parameters section).

Recursive Adapter Trimming

If the adapter was not found within the complete read sequence using the minimum adapter length, the program will try to detect the adapter at the 3’ end of the read using recursively shorter minimum adapter lengths. For example, if the adapter min. length is 10, then in the first round the last nine bases would be aligned to the adapter (only to the first 9 bases of the 5’ end of the adapter sequence), in the second round the last 8 bases etc. No lower threshold for the minimum adapter length is established and therefore most trimmings of the last couple of bases might be by chance alone.

Select species

**If there is a species that is missing and you would like to see in sRNAbench, please fill this form **

sRNAbench can profile small RNAs from experiments with genetic material from different organisms. Therefore, different species can be selected by means of activating the corresponding ‘checkboxes’.

The first dropdown menu at your left will allow you to select a miRNA reference database. If available for your species, we recommend MirGeneDB or PmiREN. Then choose one or several short names on the second menu to profile your sample(s) using that species miRNA complement.

Selecting a genome assembly is only required if the ‘Predict New miRNAs’ option is selected. Other genomic annotations are also added through this option so it has to be provided unless you are only interested in miRNAs.

Upload User Annotations for Profiling

The user can upload annotation files for profiling, i.e. the expression values of these small RNA annotations are detected. Allowed file formats are: fasta, bed or gff.

If the genome is not in our database...

All species contained miRBase can be used even if the corresponding genome assembly is not in the our database (only the microRNA expression profiles would be generated in this case).

Illumina TrueSeq™ (280916) protocol or alternative adapter

Preprocessing of samples prepared with Illumina TruSeq):

Detect the adapter sequence: TGGAATTCTCGGGTGCCAAGGG

Alternative Illumina adapter sequence is: TCGTATGCCGTCTTCTGCTTGT (if chosen)

Minimum length of adapter sequence that needs to be aligned: 10

Allowed mismatches between adapter and reference sequences: 1

NEBnext™ protocol

Preprocessing of samples prepared with NEBNext):

Detect the adapter sequence: AGATCGGAAGAGCACACGTCT

Minimum length of adapter sequence that needs to be aligned: 10

Allowed mismatches between adapter and reference sequences: 1

Bioo Scientific Nextflex™ (v2,v3) protocol

Preprocessing of samples prepared with NEXTFLEX v2/v3):

Remove 4 nt barcode from 5' end of reads

Detect the adapter sequence: TGGAATTCTCGGGTGCCAAGGG

Minimum length of adapter sequence that needs to be aligned: 10

Allowed mismatches between adapter and reference sequences: 1

Remove 4 nt random adapter from 3' end of adapter trimmed reads

If the UMI option is chosen, the random adapters will be used as Unique Molecular Identifiers to reduce PCR bias

Clonetech SMARTer™

Preprocessing of samples prepared with SMARTer (Clonetech)):

Perform a iterative trimming of the 5' end of the read by means of iteratively aligning the reads to the genome sequence trimming unmapped reads by 1 nt (4 rounds of trimming).

Detect the adapter sequence: AAAAAAAAAAAA

Minimum length of adapter sequence that needs to be aligned: 10

Allowed mismatches between adapter and reference sequences: 1

Qiagen™ (with UMIs)

Preprocessing of samples prepared with QIAseq (Qiagen)):

Detect the adapter sequence AACTGTAGGCACCATCAAT and the position of the UMIs

Merge fragment and UMI sequences and remove duplicates

Remove UMIs from the reads

BGI's small RNA-seq protocol

Preprocessing of samples prepared with BGI's small RNA-seq.

Detect the adapter sequence: AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA

Minimum length of adapter sequence that needs to be aligned: 10

Allowed mismatches between adapter and reference sequences: 1

Customized protocol

If none of the protocols fits your data:

Provide the adapter sequence

Change minimum length of the detected adapter sequence and allowed mismatches

If there is one small RNA library preparation protocol you are missing, please contact us so we can include it in sRNAbench.