Share a resource
diArk is an online database for information about completed eukaryotic sequencing projects (genomic and EST/cDNA). It currently lists 549 species, 1070 sequencing projects, 289 publications and 304 genome files for 172 species (chromosomal, mitochondrial, chloroplast, apicoplast). diArk also provides a wide range of meta information like taxonomy, alternative species names, pathogenity, sequencing project URLs and genome statistics. It encompasses a user friendly web interface with different search modules for precise and fast definition of complex search queries. Additionally, searches can be saved and monitored. Alternatively, diArk can be queried from within other software programs using a WebService API. diArk is accessible from http://www.diark.org/.
Recent develoments :
Recently we added versioning to genome assemblies, increased the number of species by more than 30%, made available genome sequence files and added the possibility to save searches.
This work has been supported by the Max-Planck-Society, funded by grant I80798 of the VolkswagenStiftung and grant KO 2251/3-1 of the Deutsche Forschungsgemeinschaft
Odronitz F, Hellkamp M, Kollmar M. (2007) diArk - a resource for eukaryotic genome research. BMC Genomics. 8:103
ApiDots (formerly ApiEST-DB) is a database integrating mRNA/EST sequences from numerous Apicomplexan parasites. ESTs and mRNAs were clustered and further assembled to generate consensus sequences. These consensus sequences were then subjected to database searches against protein sequences and protein domain sequences. The underlying relational structure of this database allows researchers to analyze these data and pose biologically interesting questions.
Database server for cleansed EST libraries
The BASC system provides tools for the integrated mining and browsing of genetic, genomic and phenotypic data. This public resource hosts information on Brassica species supporting the Multinational Brassica Genome Sequencing Project, and is based upon five distinct modules, ESTDB, Microarray, MarkerQTL, CMap and EnsEMBL. ESTDB hosts expressed gene sequences and related annotation derived from comparison with GenBank, UniRef, and the genome sequence of Arabidopsis. The Microarray module hosts gene expression information related to genes annotated within ESTDB. MarkerQTL is the most complex module and integrates information on genetic markers, maps, individuals, genotypes and traits. Two further modules include an Arabidopsis EnsEMBL genome viewer and the CMap comparative genetic map viewer for the visualisation and integration of genetic and genomic data. The database is accessible at http://bioinformatics.pbcbasc.latrobe.edu.au
We are grateful to Professor Yong Pyo Lim, Chungnam National University, Korea, for the provision of BAC end sequence data for Brassica rapa.
Server which aims to extract proximal promoter sequences from mammalian genomes. Does so by mapping mRNA and EST sequences and tracking overlapping alignments to find the transcription start site.
Online Resource for EST (OREST) analysis is a EST analysis pipeline which allows rapid analysis of large amounts of ESTs or cDNAs from mammals and fungi. Functional annotation of the dataset is also included via either FunCat or GO annotation.
PartiGeneDB is a database of about 300 partial genomes from eukaryotic organisms that have been assembled from EST data.
Expressed Sequence Tags database; a division of GenBank containing single-pass cDNA sequence reads
The Iccare (Interspecific Comparative Clustering and Annotation foR Est) web server compares all available EST and mRNA sequences for a query organism against the set of transcripts for a reference organism. The results are presented graphically and relative to the location of genes on the chromosomes of the reference organism.
The TIGR Plant Transcript Assemblies (TA) database (http://plantta.tigr.org) uses expressed sequences collected from the NCBI GenBank Nucleotide database for the construction of transcript assemblies. The sequences collected include expressed Sequence Tags (ESTs) and full-length and partial cDNAs, but exclude computationally predicted gene sequences. The TA database includes all plant species for which more than 1,000 EST and cDNA sequences are publicly available. The database can be searched by species, by keyword or by sequence via BLAST. The sequences of the TAs are available for downloading.
Recent develoments :
The current version of the TA database is Release 2 (July 17, 2006) and includes a total of 215 plant species.
This work was supported in part by grants from the National Science Foundation Plant Genome Research Program to C. R. B. (DBI-0218166, DBI-0321538, DBI-0321663, DBI-313887) and C. D. T. (DBI-0321460).
1. Childs K.L., Hamilton J., Zhu W., Ly E., Cheung F., Wu H., Rabinowicz P.B., Town C.D., Buell C.R., Chan A.P. (2007) The TIGR Plant Transcript Assemblies Database. Nucleic Acids Res. 35: in press.
Plant EST clustering and functional annotation
The Medaka Expression Pattern Database (MEPD) stores and integrates information of gene expression during embryonic development of the small freshwater fish Medaka (Oryzias latipes). Expression patterns of genes identified by ESTs are documented by images and by descriptions through parameters like staining intensity, category and comments and through a comprehensive, hierarchically organized dictionary of anatomical terms. Sequences of the ESTs are available and searchable through blast. ESTs in the database are clustered upon entry and have been blasted against public databases. The blast results are updated regularly, stored within the database and searchable. The MEPD is a project within the Medaka Genome Initiative (MGI) and entries will be interconnected to integrated genomic map databases. MEPD is accessible through the WWW at http://medaka.dsp.jst.go.jp/MEPD
We thank Bjoern Kindler for helping us solving Java Servlet- and SQL-programming problems. We appreciate the contribution of Franck Bourrat in the generation of the Medaka anatomical dictionary. We would like to thank Hiroshi Suwa for his technical help and Hans Doebbeling to enable us to use the computational facilities of EMBL in Heidelberg.
1 Martinelli SD, Brown CG, Durbin R. Gene expression and development databases for C. elegans. Semin Cell Dev Biol. 1997 Oct;8(5):459-67
2 Janning W. FlyView, a Drosophila image database, and other Drosophila databases Semin Cell Dev Biol. 1997 Oct; 8(5):469-75.
3 Takeshi Kawashima, Shuichi Kawashima1, Yuji Kohara, Minoru Kanehisa1 and Kazuhiro W. Makabe Update of MAGEST: Maboya Gene Expression patterns and Sequence Tags Nucleic Acids Research, 2002, Vol. 30, No. 1 119-120.
4 Tetsuhiro Kudoh, Michael Tsang, Neil A. Hukriede, Xiongfong Chen, Michael Dedekian, Christopher J. Clarke, Anne Kiang, Stephanie Schultz, Jonathan A. Epstein, Reiko Toyama, and Igor B. Dawid; A Gene Expression Screen in Zebrafish Embryogenesis. Genome Res. 2001 11: 1979-1987
5 Pollet N, Schmidt HA, Gawantka V, Vingron M, Niehrs C. Axeldb: a Xenopus laevis database focusing on gene expression. Nucleic Acids Res. 2000 Jan 1;28(1):139-40
6 Ringwald M, Eppig JT, Begley DA, Corradi JP, McCright IJ, Hayamizu TF, Hill DP, Kadin JA, Richardson JE.; The Mouse Gene Expression Database (GXD). Nucleic Acids Res. 2001 Jan 1;29(1):98-101
7 Wittbrodt J, Shima A, Schartl M. Medaka - a model organism from the far East. Nat Rev Genet. 2002 Jan;3(1):53-64. Review.
8 Henrich T, Wittbrodt J. An in situ hybridization screen for the rapid isolation of differentially expressed genes. Dev Genes Evol. 2000 Jan;210(1):28-33.
9 Nguyen V, Joly J, Bourrat F. An in situ screen for genes controlling cell proliferation in the optic tectum of the medaka (Oryzias latipes). Mech Dev. 2001 Sep;107(1-2):55-67.
10 Bard JL, Kaufman MH, Dubreuil C, Brune RM, Burger A, Baldock RA, Davidson DR. An internet-accessible database of mouse developmental anatomy based on a systematic nomenclature. Mech Dev. 1998 Jun;74(1-2):111-20.
11 Hauptmann G, Gerster T Two-color whole-mount in situ hybridization to vertebrate and drosophila embryos. Trends in Genetics 1994 Aug. 10(8) 266
12 Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res. (1997) 25:3389-3402.
ChimerDB is a database of fusion sequences encompassing bioinformatics analysis of mRNA and EST sequences in the GenBank, manual collection of literature data and integration with other well known databases. Fusion transcripts with nonoverlapping alignments at multiple genomic loci were identified and filtered to remove cloning artifacts in cDNA library preparation. Fusion transcripts are classified into two groups - genuine chromosome translocation and fusion between neighboring genes owing to intergenic splicing. Literature data from pubmed abstracts and the fusion data from public resources (OMIM, Sanger CGP, Atlas chromosomes in cancer, and the Mitelman s breakpoint) are integrated to enhance the coverage and reliability. Currently, ChimerDB contains 1,258 fusion cases that involve 1,777 genes, 381 mRNA and 654 EST sequences. It can be accessed at http://genome.ewha.ac.kr/ChimerDB/.
EDAS is a database of alternative splicing derived from the anlaysis of genomic, protein, mRNA and EST data. It provides classification of elementary alternatives into main types, combined searches for specific alternative variants over tissues and disease states, curated classification of cancer-derived clone libraries by origin (cell line or primary donor tissue), a convenient user interface with in-scale and schematic representation of the alternative exon-intron structure, and a possibility to filter data by the reliability of sources.
We are grateful to Alexey Kazakov and Alexey Fedorov for useful discussion, Pavel Novichkov for programming advice, and Igor Erokhin for comments on the tissue classification. This study was supported by grants from the Howard Hughes Medical Institute (55000309), the Ludwig Institute for Cancer Research (CRDF RB0-1268), the Russian Fund of Basic Research (04-04-49361), and Programs of the Russian Academy of Sciences ("Origin and Evolution of the Biosphere" and "Molecular and Cellular Biology").
1. Nurtdinov, R.N., Artamonova, I.I., Mironov, A.A., and Gelfand, M.S. (2003) Low conservation of alternative splicing patterns in the human and mouse genomes. Hum. Mol. Genet. 2003, 12: 1313-1320.
2. Mironov, A.A., Novichkov, P.S., and Gelfand, M.S. (2001) Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences with errors. Bioinformatics, 17, 13-15.
3. Mironov, A.A., Fickett, J.W., and Gelfand, M.S. (1999) Frequent alternative splicing of human genes. Genome Res. 1999, 9: 1288-1293.
4. Nurtdinov, R.N., Neverov, A.D., Favorov, A.V., Mironov, A.A., and Gelfand, M.S. (2007) Conserved and species-specific alternative splicing in mammalian genomes. BMC Evol Biol. 2007 7:249.
A tool for visualizing splicing of genes from EST data