SELEX_DB is an online resource containing both the experimental data on in vitro selected DNA/RNA oligomers (aptamers) and the applets for these oligomers recognition. In vitro selection of oligomers binding target proteins is a novel technology intensively being developed during the last decade, for sieving a pool of synthetic oligomers through repeated cycles of PCR-amplification and protein-binding selection (1). According to Human Genome Annotation, we have developed the SELEX_DB database on oligomers selected in vitro, the database being supplied by Web-available applets for site recognition (2). Besides, since many disease may be caused not only by the mutation-altered transcription factor binding true-site on DNA, but also by the appearance of a novel protein-fitting noise-site altering a normal regulation of a gene network (e.g., the substitution -376G>A in human TNF gene promoter produces the transcription factor OCT-fitting noise-site causing the clinical phenotype 'severe malaria' (3)), the in vitro selected aptamers are very informative for the Single Nucleotide Polymorphism (SNP) analysis. At the same time, in prokaryota, the discrepancies between in vitro selected and natural sites by nucleotide-position frequency matrices have been comprehensively demonstrated (4). Besides, the positional Information Content matrices of the in vitro selected aptamers was found to be correlating with the protein-binding strength magnitudes, whereas neither correlation was found for the corresponding natural site (5). This means that, in prokaryota, natural sites were selected in vivo according to their biological activity, but not by protein affinity. In eukaryota, the relationship between in vivo and in vitro selections seems to be very knotty. From one hand, the in vitro selected TBP-binding DNAs provide the natural TATA-box activity (6). Moreover, homologous c-Myb and v-Myb proteins, minimal Myb/DNA-binding domain and, Myb-fortified cell nuclear extract are selecting in vitro the aptamers, similarities of which to one the others and to the natural c-Myb sites are significant (7). From the other hand, in vitro selected YY1-binding DNAs, inserted into the plasmids and transfected into various cells ('plasmid+cell' system), repress the reported gene (8), thus supporting the fact that YY1 binding strength and repression magnitudes do not correlate. Moreover, for these in vitro selected YY1-fitting aptamers, these YY1-caused repression measured in vivo at one 'plasmid+cell' system do not correlate to the proper magnitudes detected in vivo at the other 'plasmid+cell' system. According to this evident system-dependence of both in vitro and in vivo selected experimental data, nowadays a fundamental question is how in vitro selected data could be implemented for natural gene analysis (1). That is why novel release of the SELEX_DB has been supported by two databases SYSTEM (9, 10) and CROSS_TEST, storing both experiment systems and their cross-validation tests. By cross-validation testing, we have unexpectedly observed that, for a fixed protein-binding site, the recognition accuracy increases with the growth of homology between the target and test proteins. For natural sites, the recognition accuracy was less than for the nearest protein homologs and higher than for the distant homologs and non-homologous proteins binding the common site. The current SELEX_DB release is available at URL=http://wwwmgs.bionet.nsc.ru/mgs/systems/selex/.
Recent develoments :
Since many recent experiments have clearly demonstrated that comparability between in vivo and in vitro measurement systems is still very complex and complicated, we have supplemented SELEX_DB by two novel complementary 'experiment<=>regularity' databases. At first, the novel database SYSTEM is storing descriptions of experiments (9, 10). The SYSTEM entry cites the author's aims of an experiment and conclusions, experimental details, selection techniques, etc. Since in vitro design is capable only to simplify the real in vivo situation, SYSTEM gives information on the limits of in vitro selected data interpretation, regularities revealed, which could be applicable to analysis of gene structure, function, regulation and expression. That is why the second novel database has been required to consolidate the all experimental data obtained at various measurement systems referring to a given protein-binding site. Thus, the supplementary CROSS_TEST database documents cross-validation test results of the ¡§C¡¨-encoded applet exploiting in vitro selected data, represented in nucleotide-position frequency matrix form, in order to discriminate the related natural sites or some other selected in vitro oligomers out of random polynucleotide sequences. Currently, it contains over hundred cross-tests. For each pair, ¡§in vitro trained¡¨ applet plus the tested independent data set, the means and standard deviation of the site recognition score are given, as well as false positive and negative errors, chi-square statistic value, and its confidence level ƒÑ. The statistical estimates are hyperlinked to the training and testing data entries of our database SAMPLES (10, 11) and to the current release of SELEX_DB. Thus, CROSS_TEST answers the fundamental question how and at what accuracy and significance in vitro selected oligomers stored in SELEX_DB correspond to various related natural sites and other in vitro selected oligomers. Implementation of our cross-validation tests demonstrates that in eukaryotes in vitro selected set of sites is more similar to the natural site set than to the other in vitro selected set of sequences of homologous or other related proteins. These eukaryotes-based results contradict to conclusion in (5) stating that in vitro selection in prokaryotes do not always mimic natural evolution. This may reflect different evolutionary strategies of in vivo selection: in prokaryotes, selection is mainly directed to biological activity of individual DNA-protein complex (4, 5), whereas in eukaryotes it is realized via multi-component transcriptional machinery, which is selected by both individual DNA-protein affinity and integral characteristics of protein-protein complexes. These observations could be useful for application of in vitro selected data for genome annotation in eukaryotes.
The work is supported by the grant RFBR-01-04-49860 (Russia).
1. Roberts,R.W. and Ja,W.W. (1999) In vitro selection of nucleic acids and proteins: what are we learning? Curr Opin Struct Biol., 9, 521-529.
2. Ponomarenko,J.V., Orlova,G.V., Ponomarenko,M.P., Lavryushev,S.V., Frolov,A.S., Zybova,S.V. and Kolchanov N.A. (2000) SELEX_DB: an activated database on selected randomized DNA/RNA sequences addressed to genomic sequence annotation. Nucleic Acids Res., 28, 205-208.
3. Knight,J., Udalova,I., Hill,A., Greenwood,B., Peshu,N., Marsh,K., and Kwiatkowski,D. (1999) A polymorphism that affects OCT-1 binding to the TNF promoter region is associated with severe malaria. Nat. Genet., 22, 145-150.
4. Robison,K., McGuire,A.M. and Church,G.M. (1998) A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. J. Mol. Biol., 284, 241-254.
5. Shulzaberger,R.K. and Schneider,T.D. (1999) Using sequence logos and informational analysis of Lrp DNA binding sites to investigate discrepancies between natural selection and SELEX. Nucleic Acids Res., 27, 882-887.
6. Hardenbol,P., Wang,J.C. and van Dyke,M.W. (1997) Identification of preferred hTBP DNA binding sites by the combinatorial method REPSA. Nucleic Acids Res., 25, 3339-3344.
7. Weston,K. (1992) Extension of the DNA binding consensus of the chicken c-Myb and v-Myb proteins. Nucleic Acids Res., 20, 3043-3049.
8. Hyde-DeRuyscher,R., Jennings,E. and Shenk,T. (1995) DNA binding sites for the transcriptional activator/repressor YY1. Nucleic Acids Res., 23, 4457-4465.
9. Ponomarenko, J., Furman,D., Frolov,A., Podkolodny,N., Orlova,G., Ponomarenko,M., Kolchanov,N., and Sarai,A. (2001) ACTIVITY: a database on DNA/RNA sites activity adapted to apply sequence-activity relationships from one system to another. Nucleic Acids Res., 29, 284-287.
10. Ponomarenko,J., Merkulova,T., Vasiliev,G., Levashova,Z., Orlova,G., Lavryushev,S., Fokin,O., Ponomarenko,M., Frolov,A., and Sarai,A. (2001) rSNP_Guide, a database system for analysis of transcription factor binding to target sequences: application to SNPs and site-directed mutations. Nucleic Acids Res., 29, 312-316.
11. Ponomarenko M.P., Ponomarenko J.V., Frolov A.S., Podkolodnaya O.A., Vorobyev D.G., Kolchanov N.A., and Overton G.C. (1999) Oligonucleotide frequency matrices addressed to recognizing functional DNA sites. Bioinformatics, 15, 631-643.