AS2TS

From
amino-acid sequence (AS) to tertiary structure (TS)

Server Description


AS2TS service is designed to facilitate the construction of 3D models for protein sequences.
For a given query sequence the process of 3D models construction is performed as follows:

1. searching for the closest sequence homologs in Protein Data Bank (PDB)
2. calculating sequence alignments between query and detected sequence homologs
3. assigning ATOM coordinates from PDB structures of detected homologs to the 
   corresponding amino-acids from the query sequence
4. reporting list of closest homologs from PDB, calculated alignments, and 
   corresponding 3D models (coordinates of mainchain atoms, sidechain atoms 
   can be calculated optionally using SCWRL program)

For AS2TS processing the sequence of amino-acids should be entered in FASTA format.

   A sequence in FASTA format begins with a single-line description, followed by 
   lines of sequence data. The description line is distinguished from the sequence
   data by a greater-than (">") symbol in the first column (see an example below):

>Name
RKNGLNVKMDYTPNSGQLVRNLLNGKYNIAVAGIDNVIAYQEGQVKEPVVNPDMFAFYGV
KELKLDYELKPMDFSGIIPALQTKNVDLALAGITITDERKKAIDFSDGYYK

For the process of model building several options are available:

- Number of generated models
    this number allows a user to select how many closest homologs from PDB will 
    be searched for model building
- Mutation matrix
    a user can choose from different "substitution matrices" (BLOSUM_45, PAM_250, 
    BLOSUM_50, BLOSUM_62, BLOSUM_80, PAM_70, PAM_30) used by the alignment programs. 
    The theory of amino acid substitution matrices is described in [1]. 
    By changing substitution matrices the user can evaluate calculated alignments 
    and determine how stable created 3D models are.
    The following default gap penalties are assigned to each selected substitution 
    matrix:
        BLOSUM45 -G 15 -E 2
        PAM250   -G 14 -E 2 
        BLOSUM50 -G 13 -E 2
        BLOSUM62 -G 11 -E 1
        BLOSUM80 -G 10 -E 1
        PAM70    -G 10 -E 1
        PAM30    -G  9 -E 1
    The substitution matrices are listed in the order which reflects the following
    recommendations of usage:
    long alignments with low similarity ---> short alignments with high similarity
- Pairwise sequence alignment search 
    pairwise sequence alignment searches are performed against PDB.
    Smith-Waterman, FASTA or BLAST can be selected as an alignment search program
- Multiple sequence alignment search 
    by selecting PSI-BLAST a user can decide how many BLAST iterations have to
    be performed against sequences from NR library (non-redundant sequences from 
    NCBI). The final PSI-BLAST iteration is run against PDB.
- Side chains building procedure (SCWRL)

Example of the reported results from AS2TS system:

Model    PDB       N_AA   SISC    E-val    Seq_ID      LAL      Overlap 
M_00    1fil        139      5    2e-44    29.000    130:137    (2-131:1-137)          
M_01    1awi_A      138      4    4e-44    28.000    129:136    (3-131:1-136)          
M_02    1pne        140      2    4e-44    29.000    130:137    (2-131:2-138)          
M_03    2btf_P      139      1    4e-44    29.000    130:137    (2-131:1-137)          
M_04    1d1j_A      139      4    7e-44    23.000    130:137    (2-131:1-137)          
M_05    1a0k        131      1    4e-42    14.000    124:135    (1-131:1-129)          
M_06    1plm_A      130      1    5e-42    13.000    123:132    (4-131:2-128)          
M_07    1g5u_A      131      2    5e-41    14.000    125:136    (1-132:1-130)          
M_08    3nul        130      1    4e-40    13.000    123:132    (4-131:2-128)          
M_09    1cqa        133      1    5e-40    13.000    121:138    (1-132:1-132)          
M_10    1ypr_A      125      3    4e-31    12.000    114:132    (4-129:2-121)          
M_11    1f2k_A      125      3    1e-25     9.000    116:136    (4-132:2-124)          
M_12    1prq        125      2    3e-25     9.000    116:136    (4-132:2-124)          
M_13    1acf        125      1    8e-25     9.000    116:136    (4-132:2-124)          
M_14    1bhn_A      152      6      1.5    10.000     91:116    (36-132:4-113)         
M_15    1pku_A      150     12      1.7    10.000     79:99     (48-132:19-111)        
M_16    1fiq_C      763      1      2.2    10.000     66:70     (47-112:146-215)       
M_17    1v97_A     1332      6      2.2    10.000     66:70     (47-112:715-784)       
M_18    1n5x_A     1331      2      2.2    10.000     66:70     (47-112:714-783)       
M_19    1ha7_B      172     12      2.4    13.000     40:43     (88-127:121-163)       

where the following information is provided in columns:
- Model
    link to the coordinates of calculated model
- PDB
    link to the information about a PDB template used for model building
- N_AA
    number of amino-acids in the template sequence
- SISC
    number of different sets of coordinates of a protein template (link to PDB files)
- E-val
    score (E-value) calculated by the selected alignment program (lower values 
    indicate better alignments)
- Seq_ID
    sequence identity calculated from the alignment
- LAL (N:M)
    N - number of amino-acids assigned from ATOM coordinates from the PDB template, 
    M - length of the sequence alignment
- Overlap
    the sequence ranges in the alignment between the query and PDB template


References:

[1] Altschul, S.F. (1991) "Amino acid substitution matrices from an information
    theoretic perspective." J. Mol. Biol. 219:555-565.

[2] Smith, T. F., Waterman, M. S. (1981). Identification of common molecular
    subsequences. J Mol Biol 147(1), 195-197.

[3] Pearson, W. R. (1991). Searching protein sequence libraries: comparison of the
    sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.
    Genomics11:635-650.

[4] Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z.,
    Miller, W., Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new
    generation of protein database search programs. Nucleic Acids Res 25(17),
    3389-3402.

[5] Bower, M., Cohen, F. E., Dunbrack, Jr. R. L. (1997). "Sidechain prediction from 
    a backbone-dependent rotamer library: A new tool for homology modeling" J. Mol. 
    Biol. 267, 1268-1282.