Proteases, protease inhibitors and protease mutations in human, chimpanzee, mouse, and rat
LINKER is a tool for designing linker peptide sequences for use in the construction of fusion proteins. The user provides the desired length of the linker in either Angstroms or number of residues, and several other constraints may also be specified, including the inclusion or exclusion of certain amino acids or protease sensitive sites.
Cleaves a protein sequence with a chosen enzyme and computes masses of the generated peptides.
Cleaved Radioactivity of Phosphopeptides (CRP) performs in silico proteolytic cleavage of protein sequences and reports the radioactivity that would be observed if a given serine, threonine or tyrosine were phosphorylated.
GraBCas is a tool for predicting granzyme B and caspase cleavage sites.
NeuroPred is a tool designed to predict cleavage sites at basic amino acid locations in neuropeptide precursor sequences. Neuropred also computes the mass of the predicted peptides with or without selected post-translational modifications.
Pcleavage is a tool that uses a support vector machine to predict immunoproteasome and constitutive proteasome cleavage sites in antigenic sequences.
CutDB is one of the first systematic efforts to build an easily accessible collection of documented proteolytic events for natural proteins in vivo or in vitro. A CutDB entry is defined by a unique combination of these three attributes: protease, protein substrate, and cleavage site. Currently, CutDB integrates 3,070 proteolytic events for 470 different proteases captured from public archives (such as MEROPS and HPRD) and publications. CutDB supports various types of data searches and displays, including clickable network diagrams. Most importantly, CutDB is a community annotation resource based on a Wikipedia approach, providing a convenient user interface to input new data online. A recent contribution of 568 proteolytic events by several experts in the field of matrix metallopeptidases suggests that this approach will significantly accelerate the development of CutDB content. CutDB is publicly available at http://cutdb.burnham.org.
The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogues evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of antiretroviral therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions to GenBank, sequences published in journal articles, and sequences of HIV isolates from persons participating in clinical trials. Sequences are linked to data about the source of the sequence, the antiretroviral drug treatment history of the person from whom the sequence was obtained and the results of in vitro drug susceptibility testing. Sequence data on two new molecular targets of HIV drug therapy ¯ gp41 (cell fusion) and integrase ¯ will be added to the database shortly.
Recent develoments :
Antiretroviral drug resistance is a major obstacle to the successful treatment of human immunodeficiency virus type 1 (HIV-1) infection. A large number of retrospective and prospective studies have demonstrated that the presence of drug resistance before starting a treatment regimen is an independent predictor of success of that regimen (1). As a result, several expert panels have recommended that HIV RT and protease sequencing be done to help physicians select antiretroviral drugs for their patients and genotypic resistance testing has been part of routine clinical care for the past several years (2). The HIV RT and Protease Sequence Database (HIVRT&PrDB;) is intended to assist scientists designing new HIV-1 drugs, clinical investigators studying HIV-1 drug resistance, and clinicians using genotypic HIV-1 drug resistance tests (3). The database links sequence changes in the molecular targets of HIV-1 therapy to other forms of data including treatment history and phenotypic (drug susceptibility) data. Data on the virological response (plasma HIV-1 RNA levels) to a new treatment regimen have been added and will soon be accessible over the web. The HIVRT&PrDB; is a relational database with 19 normalized (nonredundant) core tables, 10 look-up tables, and about 20 derived tables. The database is implemented using MySQL on a Linux platform. There are several major hierarchical relationships linking key entities in the database: (i) patient �¨ treatment history (list of drug regimens and their start and stop dates); (ii) patient �¨ isolate (clinical) �¨ sequence �¨ drug susceptibility result; (iii) isolate (laboratory) �¨ drug susceptibility result, (iv) patient �¨ plasma HIV-1 RNA level. Sequences are stored in a virtual alignment with the subtype B consensus sequence; thus amino acid sequences are also represented as lists of differences from the consensus sequence. The HIVRT&PrDB; contains data from more than 420 published papers. Sequences are available on HIV-1 isolates from more than 7,000 individuals and from about 500 laboratory isolates containing mutations generated by virus passage or site-directed mutagenesis. About 20,000 drug susceptibility results from tests performed on more than 2,000 virus isolates are available. Figure 1 and 2 contain composite alignments showing 193 protease and 395 RT mutations present at a frequency of >0.1% in HIV-1 isolates from treated and untreated persons. Figure 3 shows a summary of the drug susceptibility results available on each of the 16 approved antiretroviral drugs. The database allows users to retrieve sets of sequences meeting specific criteria. Commonly submitted queries include (i) the retrieval of sequences of HIV-1 isolates from patients receiving a specific drug treatment, (ii) the retrieval of sequences of HIV-1 isolates containing mutations at specific protease or RT positions, (iii) the retrieval of drug susceptibility data on HIV-1 isolates containing specific mutations or combinations of mutations, and (iv) a summary of data in any particular reference. Each query initially returns data in the form of a table and each record in the returned table contains 8 or more columns of data. The data returned include (i) hyperlinks to the MEDLINE abstract and GenBank record, (ii) a list of mutations in the sequence, (iii) a classification of the sequence by patient and time point, (iv) drug treatment history, and (v) additional data depending upon the query (e.g., drug susceptibility results, phylogenetic data, technical data about virus isolation and sequencing). Together with this table users are given the option of downloading or viewing the raw sequence data in a variety of formats. Sequence interpretation programs The database website contains three sequence interpretation programs. The first program, HIVseq, accepts user-submitted RT and protease sequences, compares them to a reference sequence, and uses the differences (mutations) as query parameters for interrogating the database (4). HIVseq allows users to examine new sequences in the context of previously published sequences, providing two main advantages. First, unusual sequence results can be detected and immediately rechecked. Second, unexpected associations between sequences or isolates can be discovered when the program retrieves data on isolates sharing one or more mutations with the new sequence. The second program, a drug resistance interpretation program (HIVdb), accepts user-submitted protease and RT sequences and returns inferred levels of resistance to the 16 FDA-approved antiretroviral drugs. Each drug resistance mutation is assigned a drug penalty score; the total score for a drug is derived by adding the scores associated with each mutation. Using the total drug score, the program reports one of the following levels of inferred drug resistance: susceptible, potential low-level resistance, low-level resistance, intermediate resistance, and high-level resistance. The third program (HIValg), allows researchers to compare the output of different publicly available drug-resistance algorithms on the same sequence or set of sequences. The algorithms used by this program are encoded using a programming platform or Algorithm Specification Interface (ASI) developed to facilitate the comparison of HIV genotypic resistance algorithms. ASI consists of an XML format for specifying an algorithm and a compiler that transforms the XML into executable code. New additions planned for the HIVRT&PrDB; Two additions to the database are planned: (i) gp41 sequences and data on resistance to fusion inhibitors. The first fusion inhibitor, enfuvirtide (T-20) has been shown to have potent antiretroviral activity in clinical trials (5,6) and is likely to be approved in 2003. A wide range of mutations in gp41 contributing to T-20 resistance, most occurring between residues 36-45, have been reported, but mutations outside of this region also appear to contribute to drug resistance(7,8). (ii) integrase sequences. A new class of compounds that inhibit HIV-1 integrase have been shown to be active in vitro and in a SHIV rhesus macaque model of infection (9,10).
The project was supported in part by NIH/NIAID (AI-46148-02) and a Stanford University BioX Interdisciplinary Project Grant.
1. Shafer, R.W. (2002) Clin Microbiol Rev, 15, 247-277.
2. Hirsch, M.S., Brun-Vezinet, F., D'Aquila, R.T., Hammer, S.M., Johnson, V.A., Kuritzkes, D.R., Loveday, C., Mellors, J.W., Clotet, B., Conway, B. et al. (2000) JAMA, 283, 2417-2426.
3. Kantor, R., Machekano, R., Gonzales, M.J., Dupnik, B.S., Schapiro, J.M. and Shafer, R.W. (2001) Nucleic Acids Res, 29, 296-299.
4. Shafer, R.W., Jung, D.R. and Betts, B.J. (2000) Nat Med, 6, 1290-1292.
5. Kilby, J.M., Hopkins, S., Venetta, T.M., DiMassimo, B., Cloud, G.A., Lee, J.Y., Alldredge, L., Hunter, E., Lambert, D., Bolognesi, D. et al. (1998) Nat.Med., 4, 1302-1307.
6. Kilby, J.M., Lalezari, J.P., Eron, J.J., Carlson, M., Cohen, C., Arduino, R.C., Goodgame, J.C., Gallant, J.E., Volberding, P., Murphy, R.L. et al. (2002) AIDS Res Hum Retroviruses, 18, 685-693.
7. Wei, X., Decker, J.M., Liu, H., Zhang, Z., Arani, R.B., Kilby, J.M., Saag, M.S., Wu, X., Shaw, G.M. and Kappes, J.C. (2002) Antimicrob Agents Chemother, 46, 1896-1905.
8. Sista, P.R., Melby, T., Greenberg, M., Davison, D., Jin, L., Mosier, S., Mink, M., Nelson, E., Fang, L., Cammack, N. et al. (2002) Antivir Ther, 7, S16-S17.
9. Grobler, J.A., Stillmock, K., Hu, B., Witmer, M., Felock, P., Espeseth, A.S., Wolfe, A., Egbertson, M., Bourgeois, M., Melamed, J. et al. (2002) Proc Natl Acad Sci U S A, 99, 6661-6666.
10. Hazuda, D.J. and HIV-1 Integrase Inhibitor Discovery Team. (2002) Antivir Ther, 7, S3.
11. Hertogs, K., de Bethune, M.P., Miller, V., Ivens, T., Schel, P., Van Cauwenberge, A., Van Den Eynde, C., Van Gerwen, V., Azijn, H., Van Houtte, M. et al. (1998) Antimicrob.Agents Chemother., 42, 269-276.
12. Petropoulos, C.J., Parkin, N.T., Limoli, K.L., Lie, Y.S., Wrin, T., Huang, W., Tian, H., Smith, D., Winslow, G.A., Capon, D.J. et al. (2000) Antimicrob Agents Chemother, 44, 920-928.
13. Beerenwinkel, N., Schmidt, B., Walter, H., Kaiser, R., Lengauer, T., Hoffmann, D., Korn, K. and Selbig, J. (2002) Proc Natl Acad Sci U S A, 99, 8271-8276.
Proteolytic enzymes, commonly termed proteases, but known scientifically as peptidases, are of enormous importance in human health, disease and technology. The MEROPS database provides a wealth of information about peptidases to the many scientists working on them in academia and industry. The database is organised around a structural classification of peptidases in which the enzymes are grouped into families on the basis of amino acid sequence homology. The families are further assembled into clans in the light of evidence (usually from similarities in tertiary structure) that they share common ancestry. The database contains four major types of documents: PepCards, FamCards, ClanCards and SpecCards, each giving information on a single peptidase, family, clan or organism, respectively. Access to the Card files is through links from indexes, or by online searches, and each of them gives information on classification and nomenclature as well as an interface to entries in other databases. Each PepCard includes data for human and mouse genetics, and links to the nucleic acid and amino acid sequences. If tertiary structural coordinates have been deposited, PDB entries are listed, and usually a rendered molecular image is provided. There is a PepCard even for a peptidase that cannot yet be assigned to a family (because no complete amino acid sequence is available) so long as it has been characterised biochemically. A FamCard includes links to other databases of sequence motifs and secondary and tertiary structure, a protein sequence alignment for the peptidase units of members of the family and a tree showing how the members of the family are related. The sequence alignments are annotated to highlight catalytic residues, disulfide bridges, carbohydrate attachment sites and transmembrane domains. Each FamCard includes a diagram showing distribution of homologues amongst completely sequenced genomes. Each ClanCard includes a diagram showing conservation of amino acid sequence around catalytic residues. Each SpecCard includes an abbreviated taxonomy, and a list of all the peptidases known from the species. The FamCards and ClanCards also show the distribution of peptidases in the group across the major kingdoms of living creatures. Each PepCard, FamCard and ClanCard includes literature references with links to Medline. Online searches provide rapid navigation of the database and the user can search for a peptidase by its name (partial or complete) or by a known accession number in a database such as SwissProt, TrEMBL, PIR, EMBL/GenBank and PDB. The user can also retrieve all the peptidases assigned to a specified human or mouse chromosome, and, for any given family or clan, may find all the peptidases for which structural coordinates have been deposited. Amongst the forms of biochemical information is data on the specificity of peptidases, accessed by additional online searches. A collection of known cleavage sites in synthetic substrates, peptides and proteins can be searched by providing the name of a substrate or peptidase, or by filling substrate specificity subsites. The database is maintained by a team consisting of N. D. Rawlings, D. P. Tolle and A. J. Barrett.
Supported by the Medical Research Council (UK) and the Biotechnology and Biological Sciences Research Council (UK).
Plant protease inhibitors (PIs) can be counted among the defensive proteins that plants display to minimize the adverse effects deriving from the attack of phytophagous insects. They are usually present in seeds and storage tissues, but are also expressed in the aerial part of plant upon insect attack. Their activity on gut proteases attenuates amino acid assimilation and slows the growth of feeding insects. Owing to the direct defensive properties of PIs, several transgenic plants have been produced expressing specific inhibitors and tested for enhanced resistance against phytophagous insects. Even if this approach revealed effective in many cases, it also gave the opportunity to demonstrate the capacity of some insects to overcome the effect of PI ingestion by up-regulating the synthesis of new proteases, insensitive to PIs. In this respect, structural studies on PIs and protease-PI complexes are needed to better understand the molecular mechanism of protease resistance and to put the basis for designing new PIs active toward insensitive proteases. Among plant PIs, inhibitors active toward the four mechanistic classes of proteases (serine-, cysteine-, aspartic- and metallo-proteases) have been described. The activity of PIs is due to their capacity to form stable complexes with target proteases, blocking, altering or preventing access to the enzyme active site. Plant PIs can also be catalogued in a number of families on the basis of their primary structure. Many of these families contain PIs specific for a single mechanistic class of proteases, but this does not seem to be a rule. PLANT-PIs is a database developed to facilitate retrieval of information on plant PIs and their genes. The need for a database comes from the large amount of sequences available and the increasing amount of data concerning the activity and structure of these molecules. The database correlates information contained in primary sequence databases (EMBL and SwissProt) to functional analysis of the proteins reported in literature. For each PI links to sequence databases are reported together with a summary of the functional properties of the molecule (and its mutants) as deduced from literature. Transgenic plants obtained transferring PI genes are also reported as deduced from literature. PLANT-PIs, as up-dated in July 2002, contains information for 495 inhibitors, plus several isoinhibitors, identified over 129 plant species. The database is accessible at http://bighost.area. ba.cnr.it/PLANT-PIs.
1. Ryan, C.A. (1990) Protease inhibitors in plants: genes for improving defenses against insects and pathoghens, Annu. Rev. Phytopath., 28: 425-449.
2. Broadway R.M. (1995) Are insects resistant to plant proteinase inhibitors?, J. Insect Physiol.,
3. Jongsma M.A., Bakker P.L., Peters J., Bosch D., and Stiekema W.J. (1995) Adaptation of Spodoptera exigua larvae to plant proteinase inhibitors by induction of gut proteinase activity insensitive to inhibition, Proc. Natl. Acad. Sci. USA, 92:
4. Jouanin L., Bonade-Bottino M., Girard C., Morrot G., and Giband M. (1998) Transgenic plants for insect resistance, Plant Science, 131: 1-11
5. Valueva T.A., and Mosolov V.V., (1999) Protein inhibitors of proteinases in seeds: 1. Classification, distribution, structure, and properties, Russian J. Plant Physiol., 46: 362-378
6. Laskowski, Jr., M., and Qasim M.A. (2000) What can the structures of enzyme-inhibitor complexes tell us about the structures of enzyme substrate complexes? Biochim. Biophys. Acta, 1477: 324-337
7. Bode, W., and Huber, R. (2000) Structural basis of the endoproteinase-protein inhibitor interaction, Biochim. Biophys. Acta, 1477: 241-252
8. De Leo F, Volpicella M, Licciulli F, Liuni S, Gallerani R, Ceci LR. (2002) PLANT-PIs: a database for plant protease inhibitors and their genes, Nucleic Acids Res. 30: 347-8
Proteases catalyze the cleavage of peptide bonds in other proteins. These proteins represent one of the three largest groups of industrial enzymes and account for up to 60% of the total worldwide sale of enzymes. Bacterial proteases have a diverse range of function and mechanisms of action. They can be responsible for complex processes under normal physiological circumstances as well as in abnormal patho-physiological conditions. ProLysED (Prokaryotic Lysis Enzymes Database) is a metaserver integrated database for protease systems of bacterial origin and includes regulatory and inhibitor proteins in the dataset. The datasets were retrieved from the SWISS-PROT - sequences and annotation; and PDB - sequence and structural information. This data retrieval process was carried out using a combination of keyword searches and limiting the searches to only bacterial species. Further comparative analyses were carried against other known and well characterized proteases. These entries were then annotated further manually. MySQL (http://www.mysql.com) was used as the core relational database while the interfacing was done mainly using PHP (http://www.php.net). ProLysED is structured according to the following interlinked databases for basic data, sequences, structures, internal annotation and user interface. The basic data database contains information for protease class, bacterial species, GenBank Acession number, basic description and E.C. number and data source. The sequences and structures contain sequence data in FASTA format and structural data as PDB formatted files respectively. The internal annotation database contains further annotation of protease activity, inhibitor information and any other extra annotation which is not available in the original data source. The user database controls user access and usage sessions. Data without protein structure information can be submitted via metaserver to the following external services PHD, 3DPSSM, SAM-T99 and PSI-PRED through user sessions with results being returned by the respective services via email. The database is automatically updated using a Perl agent at fortnightly intervals. Updated datasets are extracted in XML format using the specified search terms. The Perl script parses the files for updates, authenticates the organism source to be prokaryotic and automatically updates the MySQL database. Any further annotation is then carried out by manually. Cross references to GenBank, PDB, the SCOP, MEROPS, PDBSum and CATH structural database as well as an external BLAST to SWISSPROT is also interfaced from ProLysED. A specific ProLysED BLAST server with the database limited to bacterial proteases and related systems is also provided. Data entries with corresponding structural data in the form of PDB formatted files is also viewable via the Chime molecular viewer plug-in (http://www.mdli.com). These structures are selectable from the bacterial protease structures data integrated within ProLysED.
The authors would like to acknowledge the National Biotechnology and Bioinformatics Network (NBBnet, http://www.nbbnet.gov.my), Malaysia and the Interim Lab., National Inst. for Genomics and Molecular Biology Malaysia (http://genome.ukm.my), which is hosting this resource. We are also grateful for miscellanous technical input and feedback from Ahmad Fuad Hilmi Muhammad, as well as assistance with the data quality control from Ayu Haslin, Norazlin and Yazeereen.