Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. We have created a database of evolutionary conserved regions in vertebrate genomes, entitled ECRbase, which is constructed from a collection of whole-genome alignments produced by the ECR Browser. ECRbase features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a comprehensive collection of promoters in all vertebrate genomes generated using multiple sources of gene annotation. The database also contains a collection of annotated transcription factor binding sites in evolutionary conserved and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and Fugu genomes. It is freely accessible at http://ecrbase.dcode.org
ECRbase was supported in part by grants from the Lawrence Livermore National Laboratory
G.G. Loots and I. Ovcharenko, ECRbase: Database of Evolutionary Conserved Regions, Promoters, and Transcription Factor Binding Sites in Vertebrate Genomes, Bioinformatics, 23(1):122-4 (2007)
Transposed elements influence on the transcriptome of seven vertebrates and invertebrates
Database of animal toxins
Conserved syntenies for various animal, plant and bacterial genomes
Genomics of butterflies (Lepidoptera)
Sex bias in insect gene expression database
Experimentally verified mammalian protein complexes
Sea urchin Strongylocentrotus purpuratus genome database
Online Resource for EST (OREST) analysis is a EST analysis pipeline which allows rapid analysis of large amounts of ESTs or cDNAs from mammals and fungi. Functional annotation of the dataset is also included via either FunCat or GO annotation.
GenDecoder is a prediction server for animal mitochondrial genetic codes. It provides information about codon-usage, amino acid composition, GC content and a final genetic code prediction for a mitochondrial genome sequence.
Genome databases for farm and other animals
BodyMap-Xs (http://bodymap.jp) is a database for cross-species gene expression comparison; it was created by the anatomical breakdown of 17 million animal EST records in DDBJ by using a sorting program tailored for this purpose. In BodyMap-Xs, users are allowed to compare the expression patterns of orthologous and paralogous genes in a coherent manner; this will provide valuable insights for the evolutionary study of gene expression and identification of a responsive motif for a particular expression pattern. In addition, starting from a concise overview of the taxonomical and anatomical breakdown of all animal ESTs, users can navigate to obtain gene expression ranking of a particular tissue in a particular animal; this method may lead to the understanding of the similarities and differences between the homologous tissues across animal species. BodyMap-Xs will be automatically updated in synchronization with the major update in DDBJ, which occurs periodically.
Evolutionary distances between
ortholologous vertebrate genes
Ensembl is a joint project between EMBL - EBI (http://www.ebi.ac.uk) and the Wellcome Trust Sanger Institute (http://www.sanger.ac.uk) to develop a software system which produces and maintains automatic annotation on metazoan genomes.
Ensembl receives primary funding from the Wellcome Trust (http://www.wellcome.ac.uk/). Additional funding is provided by EMBL, NHGRI, NIH-NIAID, BBSRC, MRC and the European Union.
The Mammalia Polymorphism Database (1) is a secondary database designed to provide a collection of all the existing polymorphic sequences in the Mammalia class grouped by name of organism and gene. It allows the search for any polymorphic set according to different parameter values of nucleotide diversity. For data collection, diversity measures and updating we use PDA (2), a pipeline made of a set of Perl modules that automates the process of sequence retrieving, grouping, aligning and estimating diversity parameters from Genbank sequences.
Diversity measures, including polymorphism estimates at synonymous and nonsynonymous sites, linkage disequilibrium and codon bias, are calculated for each polymorphic set in different functional regions. The database also includes the primary information retrieved from different external sources: the mammalian publicly available nucleotide sequences (excluding ESTs, STSs, GSSs, working draft and patents) with their annotations and references from Genbank, and the cross-references to the Popset database. The database content is daily updated, and records are assigned unique and permanent MamPol identification numbers to facilitate cross-database referencing.
Online query interfaces facilitate data interrogation by different polymorphism parameter values and keyword queries (based on SQL searches). Full reports, sequences and alignments in different formats and polymorphism parameters can be obtained in both textual and graphical formats. The web site also includes software facilities for data analysis and a daily-updated web page with exhaustive statistics on the contents of the database.MamPol is available at and can be downloaded via FTP.
Recent develoments :
The MamPol website includes several interfaces for browsing the contents of the database and making customisable comparative searches of different species or taxonomic groups.
MamPol has been funded by grants BFU2006-08640/BMC, TIN2004-03388 and BES-2003-0416 from the Ministerio de Ciencia y Tecnología, and grant 2005FI-00328 from the Generalitat de Catalunya.
1. Egea R, Casillas S, Fernández E, Senar MA and Barbadilla A. (2007) MamPol: a database of nucleotide polymorphism in the Mammalia class. Nucleic Acids Res. 35: in press.
2. Casillas S and Barbadilla A. (2006) PDA v.2: improving the exploration and estimation of nucleotide polymorphism in large datasets of heterogeneous DNA. Nucleic Acids Res. 34: W632-W634.