Bioinformatics.fr

Share a resource

FREP

http://facts.gsc.riken.go.jp/FREP/

Description :
Functional repeats in mouse cDNAs

DDBJ - DNA Data Bank of Japan

http://www.ddbj.nig.ac.jp/

Description :
As a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), DDBJ (http://www.ddbj.nig.ac.jp) has steadily collected, annotated, released and exchanged the original DNA sequence data, which, for example, is shown by a growth curve of the data submissions in the past years (visit http://www.ddbj.nig.ac.jp/images/breakdown_stats/percentage-e.gif). However, the current situation of data submissions is dramatically changing due to the emergence of ultra-high speed or the 2nd generation sequencers (2GS) such as 454 (by 454 Life Sciences), Solexa (by Illumina, Inc.), SOLiD (by Applied Biosystems) and Helicos (by Heliscope). With these machines the whole human genome could now be sequenced at one-thousandth or less speed of the first cases in 2001 (1, 2). Recently, two reports announced that the whole genome was sequenced for two well-known persons (3, 4), which was perhaps the beginning of personal genomics. Also known is the 1000 human genomes project that is underway in USA, Europe and China to obtain a complete and detailed catalogue of genetic variations of humans (http://www.1000genomes.org/page.php). Those activities warn us that the above growth curve will steepen drastically. At present INSDC have released about 100 billion bases in total. This is the outcome of the collaboration among the three member banks for more than 20 years. However, this number will easily be surpassed when the 1000 human genomes project is completed and the result is submitted to INSDC in a few years, or even before that. To cope with those activities INSDC collaborators discussed in 2008 the attitude towards handling mass submissions produced by 2GS. The common fear among the collaborators was limited computer storages that will sooner or later be filled with continuously coming mass submissions. Nevertheless, the collaborators agreed to collect, distribute and exchange mass data of transcriptomes such as trace archives and short reads, upon the condition that the sequences are assembled. DDBJ has also started to accept and release such mass sequence data.

Recent develoments :
In the following text, DDBJ s activity is reported with focus on mass data submissions from Japanese universities and institutes. DDBJ (http://www.ddbj.nig.ac.jp) collected and released 2,368,110 entries or 1,415,106,598 bases in the period from July 2007 to June 2008. The releases in this period include genome scale data of Bombyx mori, Oryzas latipes, Drosophila and Lotus japonicus. In addition, from this year we collected and released trace archive data in collaboration with National Center for Biotechnology Information (NCBI). The first release contains those of Oryzas latipes and bacterial meta-genomes in human gut. To cope with the current progress of sequencing technology, we also accepted and released more than 100 million short reads of parasitic protozoa and their hosts that were produced using a Solexa sequencer.

Aknowledgement :
We thank all staff of DDBJ for the data collection, annotation, release, management and software development. DDBJ is funded by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) with the management expenses grant for national university cooperation. DDBJ is also supported by a grant from the National Project of Integrating Life Science Databases.

References :
1. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W. et al. (2001) Initial sequencing and analysis of the human genome, Nature, 409, 860-921
2. Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A. et al. (2001) The sequence of the human genome, Science, 291, 1304-1351
3. Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A., He W., Chen, Y.-J., Makhijani, V., Roth, G.T. et al. (2008) The complete genome of an individual by massively parallel DNA sequencing, Nature, 452, 872-876
4. Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., Walenz, B.P., Axelrod, N., Huang, J., Kirkness, E.F., Denisov, G. et al. (2008) The diploid genome sequence of an individual human, PLoS Biology, 5, 2113-2144

EMBL Nucleotide Sequence Database

http://www.ebi.ac.uk/embl/

Description :
The EMBL Nucleotide Sequence Database (URL: http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data are exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred Web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection. For sequence similarity searches a variety of tools (e.g. Fasta, BLAST) are available.

References :
Kulikova, T., Aldebert, P., Althorpe, N., Baker, W., Bates, K., Browne, P., Van den Broek, A., Cochrane, G., Duggan, K., Eberhardt, R., Faruque, N., Garcia-Pastor, M., Harte, N., Kanz, C., Leinonen, R., Lin, Q., Lombard, V., Lopez, R., Mancuso, R., McHale, M., Nardone, F., Silventoinen, V., Stoehr, P., Stoesser, G., Tuli, M.A., Tzouvara, K., Vaughan, R., Wu, D., Zhu, W. and Apweiler, R. (2004) The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 32, D27-D30.

GenBank®

http://www.ncbi.nlm.nih.gov/ GenBank® is a comprehensive sequence database that contains publicly available DNA sequences for more than 170,000 different organisms, obtained primarily through the submission of sequence data from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the BankIt (Web) or Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical literature via PubMed. Sequence similarity searching is provided by the BLAST family of programs. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. NCBI also offers a wide range of World Wide Web retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the NCBI home page at http://www.ncbi.nlm.nih.gov.

ACLAME - A Classification of Mobile genetic Elements

http://aclame.ulb.ac.be/

Description :
A classification of genetic mobile elements

HumHot

http://www.jncasr.ac.in/humhot/

Description :
HUMHOT is a web based database of human meiotic recombination hot spot DNA sequences. The database includes the hot spot sequences (<4 kb) obtained from published literature describing the high resolution mapping of human meiotic hot spots and also the hot spot flanking sequence information. It can be queried based on hot spot locus name, chromosome number or by homology to user defined sequences. The database is also updated with new hot spot sequences as they are discovered and provides hyperlinks to commonly used tools for estimating recombination rates, performing genetic analysis and new advances in our understanding of meiotic hot spots.

InSatDb

http://www.cdfd.org.in/insatdb

Description :
InSatDb, unlike many other microsatellite databases that cater largely to the needs of microsatellites as markers, presents an interactive interface to query information regarding microsatellite characteristics per se of five fully sequenced insect genomes (fruit-fly, honeybee, malarial mosquito, red-flour beetle and silkworm). InSatDb allows users to obtain microsatellites annotated with size (in bp and repeat units); genomic location (exon, intron, up-stream or transposon); nature (perfect or imperfect); and sequence composition (repeat motif and GC%). One can access microsatellite cluster (compound repeats) information, and a list of microsatellites with conserved flanking sequences (microsatellite family or paralogs). InSatDb is complete with the insects information, web links to find details, methodology and a tutorial. A separate Analysis section illustrates the comparative genomic analysis that can be carried out using the InSatDb output.

Aknowledgement :
JN acknowledges the financial assistance from the Department of Biotechnology, Government of India.

References :
1. Archak S, Meduri E, Sravana Kumar P, Nagaraju J. (2007) InSatDb: A microsatellite database of fully sequenced insect genomes. Nucleic Acids Res. 35 (Database issue): in press.

Isfinder

http://www-is.biotoul.fr/

Description :
ISfinder (http://www-is.biotoul.fr) is a dedicated database for bacterial Insertion Sequences (IS). It has superseded the Stanford reference center. One of its functions is to assign IS names and to provide a focal point for a coherent nomenclature. It is also the repository for IS sequences. Each new IS is indexed together with information such as its DNA sequence and open reading frames or potential coding sequences, the sequence of the ends of the element and target sites, its origin and distribution together with a bibliography where available. Another objective is to continuously monitor ISs to provide updated comprehensive groupings or families and to provide some insight into their phylogenies. The site also contains extensive background information on Insertion Sequences and transposons in general. On-line tools are gradually being added. At present an on-line BLAST facility against the entire bank is available. Additional features will include alignment capability, PSI-BLAST and HMM profiles. ISfinder also includes a section on bacterial genomes and is involved in annotating the IS content of these genomes. Finally, this database is currently recommended by several microbiology journals for registration of new IS elements before their publication.

Islander

http://www.indiana.edu/~islander

Description :
Pathogenicity islands and prophages in bacterial genomes

L1Base

http://line1.molgen.mpg.de/

Description :
Functional annotation and prediction of LINE-1 elements

MICdb - Database of Prokaryotic Microsatellites

http://210.212.212.7/MIC/index.html

Description :
The MICdb (Microsatellites database) (http://www.cdfd.org.in/micas) is a comprehensive relational database of non-redundant microsatellites extracted from fully sequenced genomes. The current version (2.0) of the database is an enhanced and upgraded version compiled from 287 viral genomes as well as 129 prokaryotic genomes belonging to different phylogenetic groups and is loaded with tools and textual information so as to provide insights into structural and functional aspects of microsatellites. This database has been linked to MICAS2.0, the Web-based Microsatellite Analysis Server. MICAS provides an user-friendly front-end to systematically extract data on microsatellite tracts from hosted genomes. The database contains the following information pertaining to the microsatellites: The regions (coding/non-coding) containing microsatellite tracts, the frequencies of their occurrences, the size and the number of repeating motifs and the sequences of the tracts. Users are facilitated to query the database for details of microsatellite locations with respect to the protein coding/non-coding regions. Protein coding regions after translation are annotated with secondary structural information and positions of microsatellite tracts are shown in order to provide an insight into the possible structural changes due to microsatellite polymorphism. In the case of microsatellites occurring in the non-coding regions graphical illustrations are provided to show relative position of the microsatellite tracts with respect to the upstream and downstream coding regions. This will help in investigations on possible regulatory roles of microsatellites occurring close to protein coding regions (upstream or downstream). Sufficient textual information has been provided to help user navigation through the database and links to GenBank and Swissprot are provided to enhance information content pertaining to microsatellites. An interface to Autoprimer, a primer design program, has been provided to every microsatellite tract to compute suitable primers for PCR.

Aknowledgement :
We thank Miss Sushma for assisting in the design of the Autoprimer software. VBS greatfully acknowledges the Council of Scientific and Industrial Research (CSIR), Govt. of India, for the Junior Research Fellowship. HAN and JN gratefully acknowledge, respectively the core-grant from CDFD and an extramural grant from the Department of Biotechnology (DBT), Govt. of India.

NPRD - Nucleosome Positioning Region Database

http://srs6.bionet.nsc.ru/srs6/

Description :
Nucleosome positioning region database

OriDB - The DNA Replication Origin Database

http://www.oridb.org/

Description :
OriDB provides a web-based catalogue of confirmed and predicted DNA replication origin sites. At present this is limited to budding yeast (S. cerevisiae). Each proposed or confirmed origin site appears as a record in OriDB, with each record comprising seven pages. These pages provide, in text and graphical formats, the following information: genomic location and chromosome context of the origin site; time of origin replication; DNA sequence of proposed or experimentally confirmed origin elements; free energy required to open the DNA duplex (stress-induced DNA duplex destabilization or SIDD); and phylogenetic conservation of sequence elements. In addition, OriDB encourages community submission of additional information for each origin site through a User Notes facility. Origin sites are linked to several external resources, including the Saccharomyces Genome Database (SGD) and relevant publications at PubMed. Finally, a Chromosome Viewer utility allows users to interactively generate graphical representations of DNA replication data genome-wide. OriDB is accessible at http://www.oridb.org/.

Recent develoments :
Upcoming extensions of OriDB will include the incorporation of additional budding yeast datasets and a parallel databases for fission yeast (S. pombe) replication origins (as datasets are described).

Aknowledgement :
Funding for OriDB was provided by The Leverhulme Trust (CAN), The Royal Society (ADD) and a National Science Foundation Grant DBI-0416764 (CJB and PA).

References :
1. Nieduszynski CA, Hiraga S, Ak P, Benham CJ, Donaldson AD (2007) OriDB: a DNA Replication Origin Database. Nucleic Acids Res. 35: in press.

PACRAT

http://www.biosci.ohio-state.edu/~pacrat Archaeal and bacterial intergenic sequence features

PseudoGene

http://www.pseudogene.org/

Description :
A compilation of published pseudogene annotations covering a wife range of both eukaryote and prokaryote genomes. The database is also divided into searchable subsets of pseudogenes sharing characteristics of interest (e.g. all expressed pseudogenes or pseudogenes found by a particular identification method), and provides the capacity to create layered unions of these sets based on user-specified set priorities.

Cancel