Database

Database generation

To generate the ranking for a given quality attribute (for example ‘total number of reads’) we first obtain the corresponding values of all samples from a reference corpus. Note that we use different reference corpus: same kingdom, same sample protocol, same sample species and same species and protocol. The obtained values are sorted into ascending order and the corresponding percentile of the observed value is calculated.

Therefore, each variable can hold desired values either when the percintiles are low (percentage of adapter-dimers or ribosomal RNA for example) or when they are high (number of reads or detected microRNAs). Either way, the quartile-based colour code is always asigned taking the nature of the feature into account, i.e. the ‘best’ quartile will always be displayed in green while the ‘worst’ quartile will be red.

To represent the heatmap we had to make all percentiles coherent as the colors are programmatically filled. To achieve this, we internally calculate (100-Percentile) for those attributes for which high percentiles indicate good quality.

Database contents

Species Samples Studies total number of raw reads
Human 17745 850 2.32244E+11
Mouse 7303 460 1.01064E+11
Arabidopsis thaliana 1863 199 40669870610
Bos taurus 1753 82 23052849107
Rattus norvegicus 1662 80 13839004606
Drosophila melanogaster 1514 163 36856287588
Caenorhabditis elegans 1255 99 21744837798
Sus scrofa 751 80 9338528047
Zea mays 603 52 14216151487
Equus caballus 480 21 6354331280
Danio rerio 268 32 4146066655
Solanum lycopersicum 244 47 4169714559
Gallus gallus 242 35 3708335610
Aedes aegypti 232 24 4514242092
Canis lupus familiaris 165 11 1869150079
Apis mellifera 71 13 827627067
Oryctolagus cuniculus 54 8 556118648
Other 133 3919986171