Soybean genomics and microarray database
SoyBase, the USDA-ARS soybean genetics database, is a central repository where researchers can quickly find information on most aspects of soybean genetics, metabolism, and pathology. In 1993 SoyBase offered the first publicly available RFLP maps of the soybean genome as well as the first extensive collection of metabolic data in a plant genome database. SoyBase has since been expanded to contain almost all published data on soybean genetic research. A major milestone occurred in 2000 when a set of composite genetic maps was released. The current composite genetic maps contain more than 3800 mapped classical and molecular loci along with more than 950 QTL.
The major Classes of data in SoyBase are:
Map Collections: Classical genetic map and 35 molecular marker maps, including the composite genetic/molecular maps. Maps include molecular and whole plant phenotype markers and QTL.
Trait: Traits associated with entries in the GRIN and PVP databases, with links to genes, QTL, and pathology.
Reaction or Pathway: Clickable diagrams of more than 900 metabolic pathways showing kinetics, enzymes, metabolites and regulation. Data include detailed information for 900+ enzymes.
Pathology: Information on soybean diseases including causative organism, symptoms, differentials, distribution, and resistance mechanisms.
Nodulation: Plant and microbial processes and genes involved in nodulation of soybean.
Transformation: Summary of transformation in soybean, including methodology, transgenes, cultivar, and regeneration of plants.
Tools are provided at the SoyBase site for searching, downloading and analyzing the extensive soybean EST collection. Precomputed gene-specific clusters of ESTs are also available for searching by annotation or sequence and for download.
Recent develoments :
Genetic maps in SoyBase are now displayed using CMap (part of the GMOD project), replacing the original ACeDB-based map displays. Among other improvements in the display, users can now easily compare multiple genetic maps side-by-side with common features indicated.
Ongoing funding of SoyBase is provided by the United States Department of Agriculture, Agriculture Research Service.
1. Schlueter JA, Dixon P, Granger C, Grant D, Clark L, Doyle JJ, Shoemaker RC (2004). Mining EST databases to resolve evolutionary events in major crop species. Genome 47: 868-876.
2. Zhu H, Choi HK, Cook DR, Shoemaker RC (2005). Bridging model and crop legumes through comparative genomics. Plant Physiol. 137: 1189-1196.
The Soybean Genome Database (SoyGD) genome browser integrates the publicly available physical map, BAC sequence database and genetic map associated genomic data at http://soybeangenome.siu.edu. SoGD shows different builds of the soybean physical map and the associated minimum tile paths. High quality DNA markers that anchored contigs are shown. Since many DNA markers anchored sets of 2-8 different contigs each contig in the set represented a homologous region of related sequences. GBrowse was adapted to show sets of homologous contigs at all potential anchor points, spread laterally and prevented from overlapping. The minimum tiling path clones provide BAC end sequences (BES) to decorate the physical map with gene models, new marker anchors and new microsatellite markers. Paralogs from low copy gene families can be located. The genome browser portal showed each data type as a separate track. Tetraploid, octoploid, diploid and homologous regions are shown clearly in relation to an integrated genetic and physical map.
Recent develoments :
The new build 5 for the physical map based on the 76,749 MICF and 4,992 HICF fingerprinted clones from MTP4 will soon be available to the public. New QTL data has been incorporated from the newly release Essex by Forrest RIL population. All of the EST placed in SoyGD to date have been used for expression analysis and are related to responses of roots to biotic stresses strengthening the usefulness of SoyGD for the identification of candidates for genes underlying QTL and other genetic loci. Direct query of the data in SoyGD with users own data is possible over the Internet. Layering on the map fingerprint data from the Williams 82 cultivar is envisioned when the data are released.
In the near future the genome browser interface will be improved using as yet unpublished data in several ways; by showing sequences and gene predictions from 37,000 BESs; by showing the locations of additional microsatellite anchors, SNP anchors and BES-SSR anchors; by showing many well annotated contiguous sequenced regions of more than 100 kbp; by showing the locations of about 2,000 resistance gene analog positive clones (nucleotide binding leucine rich repeat paralogs), 700 receptor like kinases and 512 additional low-copy gene family paralogs. In addition to adding soybean data, the adoption of GBrowse tools that allow comparisons of synteny among genomes will be a priority.
This research was funded in part by a grant from the NSF 9872635, ISPOB 98-122-02 and 02-127-03 and USB 2228-5228. The SoyGD material was based upon work supported by the National Science Foundation under Grant No. 9872635. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The continued support of SIUC, College of Agriculture and Office of the Vice Chancellor for Research to MJI and DAL is appreciated. The authors thank Dr. Q Tao for assistance with fingerprinting. We thank Chester Langin for his work on SoyGD from 2004-2005. We thank LIS for their C-Map representation of our GBrowse data.