Human(GRCh38_p10),Epstein Barr Virus(NC_007605)Protocol: Illumina
There are 4 ways you can provide reads for sRNAbench to profile:
Different types of input can be provided: fastq, sra, read/count and fasta format.
The files can be provided as plain text (fastq, read/count and fasta) or compressed with gzip (*.gz extension). Please try whenever possible upload *.gz files.
Please note that sRNAbench infers the file format by means of the extension. unknown extensions are treated as read/count format. For example sample.gz would be treated as read/count format file. The program will fail if the file format is incorrectly inferred.
The recognized extensions are:
fasta: fa, fasta, fa.gz, fasta.gz
read/count: rc, rc.gz
fastq: fastq.gz, fastq, fq, fq.gz, FASTQ, FASTQ.gz, fastQ, fastQ.gz
File format explanation:
read/count format is tab separated with two columns: the read sequence separated by the read count (number of times this read was sequenced)
read read count
Note that spaces between the readID and the read count are also allowed.
One or several Species annotations can be chosen to use in the analysis.
If no species are selected, either miRBase short names (see microRNA section) or 'user libraries' need to be provided
The input reads are mapped against annotations in our database for the selected Species and against user-provided libraries (instead of first mapping to the genome).
If you are only interested in miRNAs this option will drop the rest of ncRNA libraries from the analysis, which makes the process faster.
sRNAbench will try to guess the adapter. Briefly, sRNAbench will align the first 250000 reads to the genome using the Bowtie seed function (
the adapters will not count for the mismatches). Then , the adapter
sequence is defined as the most frequent 10-mer starting at the first mismatch (default: guessAdapter=false)
Reads can have both, a 5’ barcode and 3’ adapter sequences. For example, reads of 36 nt length, out of which 5 nt correspond to the barcode will have at the most 31 nt ‘useful’ information. In such a case, the default minimum adapter length cannot be used as this would imply that only small RNAs equal or shorter than 31nt -10nt = 21nt can be profiled.
Therefore, in such a case, the minimum adapter length should be set to 6nt allowing the profiling of small RNAs up to 25 nt. Moreover, the allowed max. number of mismatches in adapter detection should be set to 0 as otherwise the false positive detection of the adapters will increase notably (given the short sequence of only 6nt that has a much higher probability to occur by chance alone.
Permitting more mismatches between the read and the adapter sequence will allow to detect and trim a higher number of adapter sequences, but will also increase the number of false positive trimmings (especially if the minimum adapter length is decreased!).
There are two methods to filter out reads of low quality:
Each platform supports a different encoding format:
For more info we recommend having a look here
This selection overwrites the species previously selected. Short names introduced must refer to miRBase annotation. For example:
Only microRNAs are profiled. This option basically lowers the run-time notably. To reduce run-time further, the prediction of novel microRNAs can be deactivated (see Parameters section).
If the adapter was not found within the complete read sequence using the minimum adapter length, the program will try to detect the adapter at the 3’ end of the read using recursively shorter minimum adapter lengths. For example, if the adapter min. length is 10, then in the first round the last nine bases would be aligned to the adapter (only to the first 9 bases of the 5’ end of the adapter sequence), in the second round the last 8 bases etc. No lower threshold for the minimum adapter length is established and therefore most trimmings of the last couple of bases might be by chance alone.
sRNAbench can profile small RNAs from experiments with genetic material from different organisms. Therefore, different species can be selected by means of activating the corresponding ‘checkboxes’.
The user can upload annotation files for profiling, i.e. the expression values of these small RNA annotations are detected. Allowed file formats are: fasta, bed or gff.
All species contained miRBase can be used even if the corresponding genome assembly is not in the our database (only the microRNA expression profiles would be generated in this case).
Preprocessing of samples prepared with Illumina TruSeq):
Preprocessing of samples prepared with NEBNext):
Preprocessing of samples prepared with NEXTFLEX v2/v3):
Preprocessing of samples prepared with SMARTer (Clonetech)):
Preprocessing of samples prepared with QIAseq (Qiagen)):
If none of the protocols fits your data: