Bioinformatics.fr

Share a resource

diArk

http://www.diark.org/

Description :
diArk is an online database for information about completed eukaryotic sequencing projects (genomic and EST/cDNA). It currently lists 549 species, 1070 sequencing projects, 289 publications and 304 genome files for 172 species (chromosomal, mitochondrial, chloroplast, apicoplast). diArk also provides a wide range of meta information like taxonomy, alternative species names, pathogenity, sequencing project URLs and genome statistics. It encompasses a user friendly web interface with different search modules for precise and fast definition of complex search queries. Additionally, searches can be saved and monitored. Alternatively, diArk can be queried from within other software programs using a WebService API. diArk is accessible from http://www.diark.org/.

Recent develoments :
Recently we added versioning to genome assemblies, increased the number of species by more than 30%, made available genome sequence files and added the possibility to save searches.

Aknowledgement :
This work has been supported by the Max-Planck-Society, funded by grant I80798 of the VolkswagenStiftung and grant KO 2251/3-1 of the Deutsche Forschungsgemeinschaft

References :
Odronitz F, Hellkamp M, Kollmar M. (2007) diArk - a resource for eukaryotic genome research. BMC Genomics. 8:103

Mammalian Gene Collection

http://mgc.nci.nih.gov/

Description :
Overview
The NIH Mammalian Gene Collection (MGC) program is a multi-institutional effort to identify and sequence cDNA clones containing a full-length open reading frame (FL-ORF) for human, mouse, and rat genes. To date, the MGC has produced over 324 cDNA libraries derived from human tissue and cell lines, as well as mouse and rat tissues. The MGC has sequenced and verified the complete FL-ORFs for a non-redundant set of 11,666 human, 10,602 mouse, and 854 rat genes.

Strategy
5' expressed-sequence tags (ESTs) are generated from libraries and analyzed to identify candidate complete ORF clones. These clones are subjected to high accuracy full-insert sequencing and assessed for complete ORFs. Candidate clones for genes are clones that have been identified as potential complete ORFs and are waiting full-insert sequencing. All MGC sequences are deposited in GenBank and available without restriction. The cDNA clones generated by the MGC are available through the IMAGE clone distribution network and are fully accessible to the community.

References :
Strausberg RL, Feingold EA, Klausner RD, Collins FS. The Mammalian Gene Collection. Science, 1999, 286, 455-457.

MGC (Mammalian Gene Collection) Program Team, Generation and Initial Analysis of more than 15,000 Full-Length Human and Mouse cDNA Sequences. PNAS, 2002, 99(26), 16899-16903.

OREST

http://mips.gsf.de/genre/proj/orest/index.html

Description :
Online Resource for EST (OREST) analysis is a EST analysis pipeline which allows rapid analysis of large amounts of ESTs or cDNAs from mammals and fungi. Functional annotation of the dataset is also included via either FunCat or GO annotation.

FANTOM - Functional Annotation of Mouse

http://www.gsc.riken.go.jp/e/FANTOM/

Description :
Functional annotations for RIKEN full-length cDNA clones.

FREP

http://facts.gsc.riken.go.jp/FREP/

Description :
Functional repeats in mouse cDNAs

TIGR Plant Transcript Assembly Database

http://planttas.tigr.org/

Description :
The TIGR Plant Transcript Assemblies (TA) database (http://plantta.tigr.org) uses expressed sequences collected from the NCBI GenBank Nucleotide database for the construction of transcript assemblies. The sequences collected include expressed Sequence Tags (ESTs) and full-length and partial cDNAs, but exclude computationally predicted gene sequences. The TA database includes all plant species for which more than 1,000 EST and cDNA sequences are publicly available. The database can be searched by species, by keyword or by sequence via BLAST. The sequences of the TAs are available for downloading.

Recent develoments :
The current version of the TA database is Release 2 (July 17, 2006) and includes a total of 215 plant species.

Aknowledgement :
This work was supported in part by grants from the National Science Foundation Plant Genome Research Program to C. R. B. (DBI-0218166, DBI-0321538, DBI-0321663, DBI-313887) and C. D. T. (DBI-0321460).

References :
1. Childs K.L., Hamilton J., Zhu W., Ly E., Cheung F., Wu H., Rabinowicz P.B., Town C.D., Buell C.R., Chan A.P. (2007) The TIGR Plant Transcript Assemblies Database. Nucleic Acids Res. 35: in press.

H-DBAS

http://jbirc.jbic.or.jp/h-dbas/

Description :
The Human-transcriptome DataBase for Alternative Splicing (H-DBAS) is a specialized database of alternatively spliced human transcripts. In this database, each of the alternative splicing variants corresponds to a completely sequenced and carefully annotated human full-length cDNA, one of those collected for the H-Invitational human transcriptome annotation meeting. H-DBAS contains 38 664 representative alternative splicing variants in 11 744 loci, in total. The data is retrievable by various features of alternative splicing, which were annotated according to manual annotations, such as by patterns of alternative splicings, consequently invoked alternations in the encoded amino acids and affected protein motifs, GO terms, predicted subcellular localization signals and transmembrane domains. The database also records recently identified very complex patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or degenerated (multiple CDS): in all three cases, completely unrelated proteins are encoded by a single locus. By using AS Viewer, each alternative splicing event can be analyzed in the context of full-length cDNAs, enabling the user’s empirical understanding of the relation between alternative splicing event and the consequent alternations in the encoded amino acid sequences together with various kinds of affected protein motifs. H-DBAS is accessible at http://jbirc.jbic.or.jp/h-dbas/.

Aknowledgement :
H-DBAS was financially supported by the Ministry of Economy, Trade and Industry of Japan (METI), the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT) and the Japan Biological Informatics Consortium (JBIC). Funding to pay the Open Access publication charges for this article was provided by JBIC.

References :
1. Imanishi, T., Itoh, T., Suzuki, Y., O Donovan, C., Fukuchi, S., Koyanagi, K.O., Barrero, R.A., Tamura, T., Yamaguchi-Kabata, Y., Tanino, M. et al. (2004) Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol, 2, e162.
2. Takeda, J., Suzuki, Y., Nakao, M., Barrero, R.A., Koyanagi, K.O., Jin, L., Motono, C., Hata, H., Isogai, T., Nagai, K. et al. (2006) Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56,419 completely sequenced and manually annotated full-length cDNAs. Nucleic Acids Res, 34, 3917-3928.

RARGE - RIKEN Arabidopsis Genome Encyclopedia

http://rarge.gsc.riken.jp/

Description :
Arabidopsis cDNAs, mutants and microarray data

Full-Malaria

http://fullmal.ims.u-tokyo.ac.jp/

Description :
Full-malaria (http://fullmal.hgc.jp/) is a database of full-length-enriched cDNA libraries of two malaria parasites: Plasmodium falciparum, P. yoelii, and Toxoplasma gondii. The libraries were produced using the oligo-capping method, which replaces the cap structure of intact mRNA and can be used to selectively clone full-length cDNAs. 5 end-one-pass nucleotide sequences were determined for a large number of these cDNAs and were mapped on the genome sequences of the parasites. In the map viewer, transcription start sites represented by oligo-capped clones and splicing of conventional ESTs are shown. Draft genome sequences of murine malaria parasite P. yoelii are aligned with the genome sequences of P. falciparum for comparative biological studies. Such cDNA clones are an invaluable resource for determining the exact structures of expressed genes.

Recent develoments :
Full-malaria 2005 includes Toxoplasma gondii, an apicomplexa protozoon that is closely related to Plasmodium species and that causes severe congenital diseases in newborn babies and acute infection in immunocompromised patients. An oligo-capped library was produced from tachyzoite stage parasites of T. gondii RH strain and large scale 5’end-one-pass sequences were determined. Six thousand one hundred and forty-nine sequences (83%) from the oligo-capped library and 95,722 (86%) of ESTs were successfully mapped on the genome sequences. Now, Full-malaria provides a basis for comparative studies of expressed genes of two genera of apicomplexa parasites.

Aknowledgement :
Nucleotide sequences and gene predictions were downloaded from PlasmoDB and ToxoDB. EST data were obtained from Genbank. This database has been constructed and maintained using a Grant-in-Aid for Publication of Scientific Research Results from the Japan Society for the Promotion of Science.

References :
1. Watanabe J, Sasaki M, Suzuki Y, Sugano S. (2001) FULL-malaria: a database for a full-length enriched cDNA library from human malaria parasite, Plasmodium falciparum. Nucleic Acid Res. 29, 70-71.
2. Watanabe J, Sasaki M, Suzuki Y, Sugano S. (2002) Analysis of transcriptomes of human malaria parasite Plasmodium falciparum using full-length enriched library: identification of novel genes and diverse transcription start sites of messenger RNAs. Gene 291, 105-113.

wFleaBase

http://wfleabase.org/

Description :
wFleaBase (http://wfleabase.org/) This database has been providing genome researchers with rapid usable access to the ecological-genetic model organism Daphnia pulex (a fresh-water crustacean of world distribution, widely used in research on ecology, environmental toxicology, and evolution) since 2003. With the recent publication of Daphnia pulex genome, this new arthropod genome provides an important evolutionary midpoint between insect and nematode genomes. wFleaBase includes genome annotations with model organism genes, gene predictions, reagant and marker sequences (EST, cDNA, microsatellite). Multiple genome assemblies from JAZZ, Arachne and PCAP assemblers are being validated to provide a best assembly, with close matching to genetic (recombination) map evidence. Tools of the Generic Model Organism Database project are used to develop this database, including Chado database schema, GBrowse and CMap genome maps, BioMart data mining and BLAST genome searches.

Aknowledgement :
This project is supported in part by grants from the National Human Genome Research Institute of the National Institutes of Health to D. Gilbert. National Science Foundation TeraGrid access grant provided computing resources.

References :
1. Colbourne JK, Singan VR, Gilbert DG. wFleaBase: the Daphnia genome database. BMC Bioinformatics. 2005, 6:45 (http://www.biomedcentral.com/1471-2105/6/45).

PEDE - Pig Expression Data Explorer

http://pede.dna.affrc.go.jp/

Description :
Full-length pig cDNA libraries and ESTs

Recent develoments :
The stored porcine ESTs are more than 160 000. We fully sequenced the inserts of 10 147 of the cDNA clones that had undergone EST analysis; the sequences and annotation of the cDNA clones and ESTs are stored in the database.

Aknowledgement :
This database is constructed and maintained with support by the Animal Genome Research Project and the Food Traceability Research Project of the Ministry of Agriculture, Forestry and Fisheries of Japan and by a Grant-in-Aid from the Japan Racing Association.

References :
1. Uenishi H, Eguchi T, Suzuki K, Sawazaki T, Toki D, Shinkai H, Okumura N, Hamasima N, Awata T (2004) PEDE (Pig EST Data Explorer): construction of a database for ESTs derived from porcine full-length cDNA libraries. Nucleic Acids Res. 32: D484-D488.
2. Uenishi H, Eguchi-Ogawa T, Shinkai H, Okumura N, Suzuki K, Toki D, Hamasima N, Awata T (2007) PEDE (Pig EST Data Explorer) has been expanded into Pig Expression Data Explorer, including 10 147 porcine full-length cDNA sequences. Nucleic Acids Res. 35: in press

INE

http://rgp.dna.affrc.go.jp/giot/INE.html

Description :
The rice genome database INE (INtegrated rice genome Explorer) is a rice integrated database with a browser for all the genomic information that have been accumulated from the large-scale rice genome analysis including cDNA analysis, genetic mapping, physical mapping and the ongoing rice genome sequencing project. A total of 481 Mbp of rice genomic sequence data (as of Sept. 2004) from 3,454 PAC/BAC and newly constructed fosmid clones has been incorporated in the database as a result of an extensive effort of the International Rice Genome Sequencing Project (IRGSP) to accelerate the completion of gettingthe genome sequence. Since the IRGSP comes very near to the goal ofcompletion, INE, which serves as the central database of IRGSP, currently contains almost 370 Mbp of non-overlapping sequence corresponding to about 95% of the entire rice genome. The sequence data areintegrated with the genetic map, a YAC (yeast artificial chromosome)-based physical map, transcript map, PAC (P1-derived artificial chromosome) / BAC (bacterial artificial chromosome) contigs for each sequenced clone. This map-based information of the sequence data would be very useful in elucidating the structure and function of specific regions of the genome particularly those containing agronomically important genes.

In 2002, RGP has finished sequencing of chromosome 1, the longest chromosomein the genome (Sasaki et al. 2002), followed by the analysis of chromosome 4 by the NCGR of Chinese Academy of Sciences (Feng et al. 2002), and chromosome 10 by US rice genome sequencing consortium (2003). At present the remaining nine chromosomes are almost completedand the finished sequences from PAC/BAC/fosmid clones are incorporated in INE. INE is also linked to the sequence annotation information. RGP is assigned for sequencing and annotation of chromosomes 1,2,6,7,8, and 9. In 2003, sequences from the rice full-length cDNAs were presented (The rice full-length cDNA consortium, 2003). The mapped cDNAs are the powerful resources for model construction of the evidenced genes. RGP has revised the system of manual curation ofthe predicted genes with the emphasis on the mapped cDNAs. For almost all the PAC/BAC clones aligned in chromosomes 2, 7, 8, and most part of 9 and half of chromosome 6, results of the manual curationof the genome sequences for the precise prediction of gene domain are presented. The RGP is also proceeding the re- annotation in which the gene models are revised with the updated information. For theclones of other six chromosomes, that are chromosomes 3,4,5,10,11, and 12, results of rice automated annotation system (RiceGAAS, Sakata et al, 2002) are presented with the finished genome sequence. Thus INE will continue to provide a system for consolidating all thegenome sequence data accumulated by the IRGSP as well as all relevant information to be elucidated from analysis of the genome sequence. INE can be accessed at http://rgp.dna.affrc.go.jp/giot/INE.html.

Recent develoments :
1. Integration of the sequence data corresponding to 95% of rice genomefrom the international sequencing effort.
2. Completion of the sequencing for almost all the chromosomes and access to the sequence data.
3. Annotation data for all the PAC/BAC clones are displayed. Revision of thecuration system based on rice full-length cDNA sequences was performed and results of the curation are shown for all the clonesalignedin chromosomes 2,6,7,8, and 9.
4. Re-annotation of the chromosome 1 sequences are successively incorporated.

Aknowledgement :
INE was developed and maintained by a grant from the Ministry of Agriculture, Forestry and Fisheries of Japan (Rice Genome Project GS-1302).

References :
1. Sasaki et al. 2002. The genome sequence and structure of rice chromosome 1. Nature 420:312-316.
2. Feng et al. 2002. Sequence and analysis of rice chromosome 4. Nature 420:316-320.
3.The Rice chromosome 10 sequencing consortium. 2003. In-depth view of structure, activity, and evolution of rice chromosome 10. Science 300:1566-1569.
4. The rice full-length cDNA consortium 2003. Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301:376-379.
5. Sakata et al. 2002. RiceGAAS: an automated annotation system and database for rice genome sequence. Nucleic Acids Research 30:98-102.

TomatEST DB

http://biosrv.cab.unina.it/tomatestdb

Description :
TomatEST is a secondary database integrating EST/cDNA sequence information from different libraries of multiple tomato species collected from dbEST. Redundant EST collections from each species are organized into clusters (gene indices). A cluster consists of one or multiple contigs. Multiple contigs in a cluster represent alternatively transcribed forms of a gene. The set of stand-alone EST sequences (singletons) and contigs, representing all the computationally defined ‘Transcript Indices’, are annotated according to similarity versus protein and RNA family databases. Sequence function description is integrated with the Gene Ontologies and the Enzyme Commission identifiers for a standard classification of gene products and for the mapping of the expressed sequences onto metabolic pathways.

Information on the origin of the ESTs, on their structural features, on clusters and contigs, as well as on functional annotations are accessible via a user-friendly web interface. Specific facilities in the database allow Transcript Indices from a query be automatically classified in Enzyme classes and in metabolic pathways. The ‘on the fly’ mapping onto the metabolic maps is integrated in the analytical tools.

Aknowledgement :
This work is supported by the AGRONANOTECH project (Italian Ministry of Agriculture) and by the EU-SOL project (VI frame programme of the European Community).
We thank Dr. Alessandra Traini and Dr. Enrico Raimondo for useful feedback on the database usage. We thank Prof. Gerardo Toraldo for technical support and useful discussions.

NEIBank

http://neibank.nei.nih.gov/

Description :
NEIBank is a project for ocular genomics. This includes both the generation and analysis of new cDNA libraries for human and animal model eye tissues and the creation of a database and web site for eye-related expression data, known eye disease genes and disease loci. Compilations of clustered and identified EST data for NEIBank libraries (and others available through Unigene) are presented by species and tissue and annotated with links to other databases, GO terms and direct links to genome browsers. A library comparison tool allows pair wise comparison of libraries for the same species or for any species in which homologous genes have been identified. Links for ordering clones are provided.

Recent develoments :
SAGE data for human eye tissues are now presented through the EyeSAGE database. Both EST and SAGE data can be interrogated in several ways, including gene identifiers, GO terms and chromosomal locations. EyeBrowse, a custom, eye-centric version of the UCSC genome browser presents EST and EyeSAGE data along with known eye disease genes. Data for loci of unidentified eye disease genes can also be displayed along with expression data to assist identification of candidate genes.

References :
1. Wistow, G. (2002) A project for ocular bioinformatics: NEIBank. Mol Vis, 8, 161-163.
2. Wistow, G. (2006) The NEIBank project for ocular genomics: data-mining gene expression in human and rodent eye tissues. Prog Retin Eye Res, 25, 43-77.
3. Bowes Rickman, C., Ebright, J.N., Zavodni, Z.J., Yu, L., Wang, T., Daiger, S.P., Wistow, G., Boon, K. and Hauser, M.A. (2006) Defining the human macula transcriptome and candidate retinal disease genes using EyeSAGE. Invest Ophthalmol Vis Sci, 47, 2305-2316.

Prostate Expression Database

http://www.pedb.org/

Description :
Sequences from prostate tissue and cell type-specific cDNA libraries

Cancel