ove duplicated IDs:
Only the first ID is used, all others are ignored (not written to the output)
After removing duplicated IDs, the output for the above example would be:
>seqA
ATCACTA
>seqB
CACTG
2) Detect and remove duplicated sequences
If this checkbox is activated, all entries with the same sequence are eliminated
The result for the above example would be:
>seqA
ATCACTA
3) Detect and remove duplicated sequences & generate a new ID by pasting the sequence IDs that have the same sequence (if this checkbox is activated → launch with mode RDG)
If the "paste names" checkbox is activated, the output would be:
>seqA=seqB
ATCACTA
4) Manipulate the sequences names (eliminate a certain string)
The UCSC table browser allows to obtain 3' UTR sequences which are needed when searching for microRNA target genes.
However the output files have the following format:
>hg19_refGene_NM_001184906 range=chr17:37408897-37417712 5'pad=0 3'pad=0 strand=- repeatMasking=none
CAATGGAGGTGGTCAACCTTGGCGAACTGAGTATTTAATGACACTTCTAG
AGCTACCGTGGAGTCTCTCCAGTGGAAGCAACCCCAGTGTTCTGAGCAAG