Share a resource
Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. We have created a database of evolutionary conserved regions in vertebrate genomes, entitled ECRbase, which is constructed from a collection of whole-genome alignments produced by the ECR Browser. ECRbase features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a comprehensive collection of promoters in all vertebrate genomes generated using multiple sources of gene annotation. The database also contains a collection of annotated transcription factor binding sites in evolutionary conserved and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and Fugu genomes. It is freely accessible at http://ecrbase.dcode.org
ECRbase was supported in part by grants from the Lawrence Livermore National Laboratory
G.G. Loots and I. Ovcharenko, ECRbase: Database of Evolutionary Conserved Regions, Promoters, and Transcription Factor Binding Sites in Vertebrate Genomes, Bioinformatics, 23(1):122-4 (2007)
University of Tokyo Genome Browser of medaka fish (Oryzias latipes) genomic data
The Joint Genome Institute Genome Portal contains browseable and blastable genome assemblies for several organisms, including Pufferfish, Frog, and Sea squirt.
This ensembl website features the zebrafish whole genome shotgun assembly sequence.
The Medaka Expression Pattern Database (MEPD) stores and integrates information of gene expression during embryonic development of the small freshwater fish Medaka (Oryzias latipes). Expression patterns of genes identified by ESTs are documented by images and by descriptions through parameters like staining intensity, category and comments and through a comprehensive, hierarchically organized dictionary of anatomical terms. Sequences of the ESTs are available and searchable through blast. ESTs in the database are clustered upon entry and have been blasted against public databases. The blast results are updated regularly, stored within the database and searchable. The MEPD is a project within the Medaka Genome Initiative (MGI) and entries will be interconnected to integrated genomic map databases. MEPD is accessible through the WWW at http://medaka.dsp.jst.go.jp/MEPD
We thank Bjoern Kindler for helping us solving Java Servlet- and SQL-programming problems. We appreciate the contribution of Franck Bourrat in the generation of the Medaka anatomical dictionary. We would like to thank Hiroshi Suwa for his technical help and Hans Doebbeling to enable us to use the computational facilities of EMBL in Heidelberg.
1 Martinelli SD, Brown CG, Durbin R. Gene expression and development databases for C. elegans. Semin Cell Dev Biol. 1997 Oct;8(5):459-67
2 Janning W. FlyView, a Drosophila image database, and other Drosophila databases Semin Cell Dev Biol. 1997 Oct; 8(5):469-75.
3 Takeshi Kawashima, Shuichi Kawashima1, Yuji Kohara, Minoru Kanehisa1 and Kazuhiro W. Makabe Update of MAGEST: Maboya Gene Expression patterns and Sequence Tags Nucleic Acids Research, 2002, Vol. 30, No. 1 119-120.
4 Tetsuhiro Kudoh, Michael Tsang, Neil A. Hukriede, Xiongfong Chen, Michael Dedekian, Christopher J. Clarke, Anne Kiang, Stephanie Schultz, Jonathan A. Epstein, Reiko Toyama, and Igor B. Dawid; A Gene Expression Screen in Zebrafish Embryogenesis. Genome Res. 2001 11: 1979-1987
5 Pollet N, Schmidt HA, Gawantka V, Vingron M, Niehrs C. Axeldb: a Xenopus laevis database focusing on gene expression. Nucleic Acids Res. 2000 Jan 1;28(1):139-40
6 Ringwald M, Eppig JT, Begley DA, Corradi JP, McCright IJ, Hayamizu TF, Hill DP, Kadin JA, Richardson JE.; The Mouse Gene Expression Database (GXD). Nucleic Acids Res. 2001 Jan 1;29(1):98-101
7 Wittbrodt J, Shima A, Schartl M. Medaka - a model organism from the far East. Nat Rev Genet. 2002 Jan;3(1):53-64. Review.
8 Henrich T, Wittbrodt J. An in situ hybridization screen for the rapid isolation of differentially expressed genes. Dev Genes Evol. 2000 Jan;210(1):28-33.
9 Nguyen V, Joly J, Bourrat F. An in situ screen for genes controlling cell proliferation in the optic tectum of the medaka (Oryzias latipes). Mech Dev. 2001 Sep;107(1-2):55-67.
10 Bard JL, Kaufman MH, Dubreuil C, Brune RM, Burger A, Baldock RA, Davidson DR. An internet-accessible database of mouse developmental anatomy based on a systematic nomenclature. Mech Dev. 1998 Jun;74(1-2):111-20.
11 Hauptmann G, Gerster T Two-color whole-mount in situ hybridization to vertebrate and drosophila embryos. Trends in Genetics 1994 Aug. 10(8) 266
12 Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res. (1997) 25:3389-3402.
euGenes is a genome information system and database that provides a common summary of eukaryote genes and genomes, at http://eugenes.org/. Popular genomes summarized include human, mouse, rat, fruitfly, mosquito, Caenorhabditis elegans worm, Saccharomyces yeast, Arabidopsis mustard weed and zebrafish. This information, automatically updated from source databases, offers several features useful to bioscientists looking for gene relationships across organisms. The database describes over 200,000 genes, using reference database gene names and IDs, along with BLAST homologies and associations with molecular functions, cell locations and biological processes. Included are whole-genome maps locating millions of features and molecular data, and options to retrieve sequences, and run BLAST comparisions. Search and retrieval of genome data is easy and quick, allowing one to ask combined questions of sequence features, protein functions and other gene attributes, and fetch results as web reports or bulk database formats, for integration in other projects such as gene expression databases.
Recent develoments :
Year 2003 updates include addition of mosquito and rat genomes. Addition of rice, Fugu, chimpanzee, Daphnia and other eukaryote genomes is in progress. Extended flexibility of euGenes is available now with redesign as a common, replicable genome information system Argos (http://eugenes.org/argos/). It can be copied and run on scientists's Unix computers, including MacOSX, Linux and Solaris, with minimal effort and knowledge of Unix. Argos includes common genome tools and software such as NCBI web BLAST, and Generic Model Organism Database (http://www.gmod.org/) tools. It now forms the basis for euGenes, FlyBase (http://flybase.net/) genome database, and new genome systems such as Daphnia pulex (http://eugenes.org/daphnia/)
The Vertebrate Genome Annotation (VEGA) database is a central repository for high quality, frequently updated, manual annotation of vertebrate finished genome sequence. Details of the projects for each species are available through the homepages for human, mouse, and zebrafish. The website is built upon code from the EnsEMBL (http://www.ensembl.org) project.
ZFIN, the zebrafish model organism database (http://zfin.org), provides a centralized source of curated and integrated zebrafish genomic, genetic and phenotypic data. Searchable interfaces are provided for finding mutant, gene, molecular segment, mapping and expression data.
Recent develoments :
Recent enhancements of ZFIN facilitate integrated studies of zebrafish functional genomics. Gene Ontology (GO) annotations are provided to describe gene products and to facilitate comparisons of genes and gene products in different organisms. GO annotations are made through an automated electronic annotation pipeline, as well as during our manual literature curation process. ZFIN has also increased the detail of curation of gene expression data. Incorporation of high-quality annotated images of gene expression submitted by researchers has continued at a high rate. Gene expression curation from the literature now includes images and captions (when permitted), the expressed genes, fish genotype, assay, experimental conditions, developmental stage, and anatomical structures. The zebrafish anatomical ontology continues to be expanded, and is used to support curation and queries at ZFIN. The integration and cross-linking of ZFIN with the genome sequencing effort at Sanger has been expanded, and analysis tools have been added to the gene pages.
Funds for the development of the Zebrafish Information Network are provided by the NIH (P41 HG002659).
1. Sprague J, Clements D, Conlin T, Edwards P, Frazer K, Schaper K, Segerdell E, Song P, Sprunger B, Westerfield M. The Zebrafish Information Network (ZFIN): the zebrafish model organism database. Nucleic Acids Res. 2003 Jan 1;31(1):241-243.
2.The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D258-D261.
Gene indices of human, mouse, zebrafish, Arabidopsis, and Drosophila
The integration of SYSTERS, GeneNest and SpliceNest into one framework facilitates the over-all exploration of the whole sequence space covering protein, mRNA and EST sequences, as well as genomic DNA. The SYSTERS protein sequence cluster set provides an automatically generated classification of all sequences of the SWISS-PROT and TrEMBL databases as well as of the predicted protein sequence sets of several completely sequenced organisms into disjoint protein family and superfamily clusters annotated with sequence information from various other resources. For each cluster an MView (database search or multiple alignment viewer) output is generated and from the resulting partial multiple alignment a majority consensus sequence is calculated. All consensus sequences together build a searchable sequence database. The sequences in every cluster have been multiply aligned and annotated with known domains from the Pfam protein family database.
GeneNest is a database and software package for producing and visualizing gene indices from ESTs and mRNAs. Currently, the database comprises gene indices of human (based on UniGene), mouse, A.thaliana, drosophila, and zebrafish. All sequences are preprocessed to detect, annotate and clip regions containing vector sequence, repeats or are of low quality. The subsequent assembly step is done with the Staden package. For all contigs of a cluster, consensus sequences are generated and extracted to build a searchable sequence database. The visualization of a contig provides further information about the sequences, the represented gene and open reading frames, and links to precomputed protein homologies detected in the SYSTERS database.
SpliceNest is a web based graphical tool to explore gene structure based on a mapping of the EST consensus sequences from GeneNest to a complete genome. Assuming that a cluster normally represents a single gene, every contig of a cluster is aligned separately to the corresponding genomic region, using a spliced alignment program. The alignments are visualized in a diagram showing the exon/intron structure of all the exons simultaneously, mapped on the common genomic sequence, automatically highlighting candidates of alternative splicing.
Recent develoments :
SYSTERS provides now a taxonomy driven user interface as well as the possibility for the interactive generation of a user-defined multiple alignment. A visualization of the tissue information in GeneNest was added for the analysis of tissue-specific gene expression. SpliceNest contains now the complete genomes of human, mouse, Drosophila, and Arabidopsis.
1. Krause,A., Haas, S.A., Coward,E., and Vingron,M. (2002) SYSTERS, GeneNest, SpliceNest: Exploring Sequence Space from Genome to Protein. Nucleic Acids Research, 30, 299-300.
2. Haas,S.A., Beissbarth,T., Rivals,E., Krause,A. and Vingron,M. (2000) GeneNest: automated generation and visualization of gene indices. Trends Genet., 16, 521-523.
3. Coward,E., Haas,S.A. and Vingron,M. (2002) SpliceNest: visualizing gene structure and alternative splicing based on EST clusters. Trends Genet., 18, 53-55.