Skip to Content.
Sympa Menu

fasta_list - fasta-36.3.6.tar.gz released

Subject: FASTA program discussion list

List archive

fasta-36.3.6.tar.gz released


Chronological Thread 
  • From: William Pearson <>
  • To:
  • Subject: fasta-36.3.6.tar.gz released
  • Date: Fri, 5 Jul 2013 16:41:47 -0400


A new release of the FASTA package, fasta-36.3.6.tar.gz, is available from:


This version fixes various bugs in earlier versions, but, more importantly, it introduces a powerful new strategy for incorporating annotation/feature information in alignments.

(1) It expands the old "-V" option to allow scripts to produce annotations about both the query and library sequences.  Thus, ssearch36 -V '\!../scripts/ann_feats_up_www2.pl' will cause ssearch36 to run the 'scripts/ann_feats_up_www2.pl' perl script to acquire feature (active sites, binding sites), variation, and domain information from the Uniprot DAS/GFF server. This information is then included in the alignment, and can be used to re-score the alignment if a variant residue improves the score, and display the state of active site residues, thus:

>>sp|P09488.3|GSTM1_HUMAN Glutathione S-transf            (218 aa)
 Site:* : 23Y=23Y : MOD_RES: Phosphotyrosine (By similarity).
 Site:* : 33Y=33Y : MOD_RES: Phosphotyrosine (By similarity).
 Site:* : 34T=34T : MOD_RES: Phosphothreonine (By similarity).
 Region: 1-88:1-88 : score=613; bits=155.9; Id=1.000; Q=422.6 :  Glutathione_S-Trfase_N :1
 Site:# : 116Y=116Y : BINDING: Substrate.
 Variant: 173N=173N : K173N : in allele GSTM1B; dbSNP:rs1065411.
 Region: 90-208:90-208 : score=809; bits=203.8; Id=1.000; Q=566.8 :  Glutathione_S_Trfase/Cl_chnl_C :2
 Variant: 210T=210T : S210T : in dbSNP:rs449856.
 s-w opt: 1500  Z-score: 1975.7  bits: 372.7 E(455146): 1.4e-102
Smith-Waterman score: 1500; 100.0% identity (100.0% similar) in 218 aa overlap (1-218:1-218)

               10        20        30        40        50        60
sp|P0  MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNL
       ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
sp|P09 MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNL
       [       10        20  *     30  **    40        50        60

               70        80        90       100       110       120
sp|P0  PYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEF
       ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
sp|P09 PYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEF
               70        80       ]9[       100       110     # 120

              130       140       150       160       170       180
sp|P0  EKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPNCLDAFPN
       ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
sp|P09 EKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPNCLDAFPN
              130       140       150       160       170  V    180

              190       200       210
sp|P0  LKDFISRFEGLEKISAYMKSSRFLPRPVFTKMAVWGNK
       ::::::::::::::::::::::::::::::::::::::
sp|P09 LKDFISRFEGLEKISAYMKSSRFLPRPVFTKMAVWGNK
              190       200       ]1V

The -V 'q\!ann_feats_up_www2.pl' option uses the script to annotate the Query sequence (it must have a UniProt accession for feature information to be found).  The ../scripts/ann_feats_up_www2.pl included with the distribution provides a simple implementation of the annotation script strategy, but is dramatically slows down alignment display, because it uses an external website for feature information.  The "scripts" directory also provides scripts that use local mysql databases to get annotation information, which is display much more quickly.

(2) Annotation scripts that provide domain information can be used to produce "sub-alignment" scores, in which the alignment is partitioned and scores for each of the partitions are calculated.  In the example below, region 1-88 is a GST N-terminal domain, and 90-208 is a GST C-terminal domain.  Associated with each of those domains is a raw sub-alignment score, the associated bit score, percent identity, and the Q-score, with is -10 * log(p-value) for the associated bit score.

In this case, since the sequences are 100% identical, the sub-alignment scores are not very interesting.  But sometimes alignments are seeded by homologous domains, but extend considerably beyond the homologous region.  For example, in this alignment between SRC8_HUMAN (cortactin) and LASP1_BOVIN, several domains align, but only the C-terminal SH3 domain alignment contributes significant score:

>>sp|Q3B7M5.1|LASP1_BOVIN LIM and SH3 domain protein 1;  LASP-1               (260 aa)
 Site:* : 371Az68T : MOD_RES: Phosphothreonine (By similarity).
 Region: 369-398:66-95 : score=20; bits=13.7; Id=0.200; Q=0.0 :  Nebulin_35r-motif
 Site:* : 407Az104T : MOD_RES: Phosphothreonine (By similarity).
 Site:* : 416P<118S : MOD_RES: Phosphoserine (By similarity).
 Region: 400-434:97-131 : score=-8; bits=8.7; Id=0.150; Q=0.0 :  Nebulin_35r-motif
 Region: 435-499:132-200 : score=3; bits=8.1; Id=0.233; Q=0.0 :  NODOM :0
 Region: 499-547:201-258 : score=124; bits=47.7; Id=0.474; Q=92.2 :  SH3
 s-w opt: 142  Z-score: 242.8  bits: 53.6 E(455146): 4.6e-06
Smith-Waterman score: 149; 27.7% identity (55.0% similar) in 202 aa overlap (369-547:66-258)
Entrez Lookup  Re-search database  General re-search

[alignment]
330 340 350 360 370 380 390 400 sp|Q1 QVSSAYQKTVPVEAVTSKTSNIRANFENLAKEKEQEDRRKAEAERAQRMAKERQEQEEARRKLEEQARAKTQTPPVSPAP :.. . :. .. . :...: : : . . :. .: sp|Q3B HKACFHCETCKMTLNMKNYKGYEKKPYCNAHYPKQSFTMVADTPENLRLKQQSELQSQVRYkeefeknkgkgfSVVADTP 30 40 50 60 *70 80 90 ] [100 * 410 420 430 440 450 460 470 480 sp|Q1 Q-----PTEERLPSSPVYEDAASFKAELSYRGPVSGTEPE-PVYSMEAADYREASSQQGLAY--ATEAVYESAEAPGHYP . :.... . .:. : : :: .: : . . ..::. ..:: . :. ::.. : . : sp|Q3B ELQRIKKTQDQISNIKYHEE-----FEKSRMGPSGGEGLECERRDPQESSYRRPQEQQQPHHIPASTPVYqq---pqqqp 110 *20 130][ 140 150 160 170 490 500 510 520 530 540 sp|Q1 AEDSTYDEYENDLGITAV-------------ALYDYQAAGDDEISFDPDDIITNIEMIDDGWWRGVCK--GRYGLFPANY : .: : :.. . ... :.:::.:: .::.::. : :.:...::::: :. . : :..:::: sp|Q3B aaqS-YGGYKEPAAPASIQRSAPGGGGKRYRAVYDYSAADEDEVSFQDGDTIVNVQQIDDGWMYGTVERTGDTGMLPANY 180 190 20][ 210 220 230 240 250 550 sp|Q1 VELRQ :: sp|Q3B VEAI 260
Sub-alignment scores, informed by -V annotation_scripts, can also be used to look for alignment of structural domains and even exon's in DNA alignments.
fasta-36.3.6 and sub-alignment scoring has been in development for the past 6 months, and some of the potential of the approach can be found at the FASTA web site:  http://fasta.bioch.virginia.edu/fasta_www2/.
Please let me know of problems.
Bill Pearson





Archive powered by MHonArc 2.6.16.

Top of Page