fasta_list - fasta-36.3.6.tar.gz released

Subject: FASTA program discussion list

List archive

fasta-36.3.6.tar.gz released

From: William Pearson <>
To:
Subject: fasta-36.3.6.tar.gz released
Date: Fri, 5 Jul 2013 16:41:47 -0400

A new release of the FASTA package, fasta-36.3.6.tar.gz, is available from:

http://faculty.virginia.edu/wrpearson/fasta/fasta36/fasta-36.3.6.tar.gz

This version fixes various bugs in earlier versions, but, more importantly, it introduces a powerful new strategy for incorporating annotation/feature information in alignments.

(1) It expands the old "-V" option to allow scripts to produce annotations about both the query and library sequences. Thus, ssearch36 -V '\!../scripts/ann_feats_up_www2.pl' will cause ssearch36 to run the 'scripts/ann_feats_up_www2.pl' perl script to acquire feature (active sites, binding sites), variation, and domain information from the Uniprot DAS/GFF server. This information is then included in the alignment, and can be used to re-score the alignment if a variant residue improves the score, and display the state of active site residues, thus:

>>sp|P09488.3|GSTM1_HUMAN Glutathione S-transf (218 aa)

Site:* : 23Y=23Y : MOD_RES: Phosphotyrosine (By similarity).

Site:* : 33Y=33Y : MOD_RES: Phosphotyrosine (By similarity).

Site:* : 34T=34T : MOD_RES: Phosphothreonine (By similarity).

Region: 1-88:1-88 : score=613; bits=155.9; Id=1.000; Q=422.6 : Glutathione_S-Trfase_N :1

Site:# : 116Y=116Y : BINDING: Substrate.

Variant: 173N=173N : K173N : in allele GSTM1B; dbSNP:rs1065411.

Region: 90-208:90-208 : score=809; bits=203.8; Id=1.000; Q=566.8 : Glutathione_S_Trfase/Cl_chnl_C :2

Variant: 210T=210T : S210T : in dbSNP:rs449856.

s-w opt: 1500 Z-score: 1975.7 bits: 372.7 E(455146): 1.4e-102

Smith-Waterman score: 1500; 100.0% identity (100.0% similar) in 218 aa overlap (1-218:1-218)

10 20 30 40 50 60

sp|P0 MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNL

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

sp|P09 MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNL

[ 10 20 * 30 ** 40 50 60

70 80 90 100 110 120

sp|P0 PYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEF

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

sp|P09 PYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEF

70 80 ]9[ 100 110 # 120

130 140 150 160 170 180

sp|P0 EKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPNCLDAFPN

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

sp|P09 EKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPNCLDAFPN

130 140 150 160 170 V 180

190 200 210

sp|P0 LKDFISRFEGLEKISAYMKSSRFLPRPVFTKMAVWGNK

::::::::::::::::::::::::::::::::::::::

sp|P09 LKDFISRFEGLEKISAYMKSSRFLPRPVFTKMAVWGNK

190 200 ]1V

The -V 'q\!ann_feats_up_www2.pl' option uses the script to annotate the Query sequence (it must have a UniProt accession for feature information to be found). The ../scripts/ann_feats_up_www2.pl included with the distribution provides a simple implementation of the annotation script strategy, but is dramatically slows down alignment display, because it uses an external website for feature information. The "scripts" directory also provides scripts that use local mysql databases to get annotation information, which is display much more quickly.

(2) Annotation scripts that provide domain information can be used to produce "sub-alignment" scores, in which the alignment is partitioned and scores for each of the partitions are calculated. In the example below, region 1-88 is a GST N-terminal domain, and 90-208 is a GST C-terminal domain. Associated with each of those domains is a raw sub-alignment score, the associated bit score, percent identity, and the Q-score, with is -10 * log(p-value) for the associated bit score.

In this case, since the sequences are 100% identical, the sub-alignment scores are not very interesting. But sometimes alignments are seeded by homologous domains, but extend considerably beyond the homologous region. For example, in this alignment between SRC8_HUMAN (cortactin) and LASP1_BOVIN, several domains align, but only the C-terminal SH3 domain alignment contributes significant score:

>>sp|Q3B7M5.1|LASP1_BOVIN LIM and SH3 domain protein 1;  LASP-1               (260 aa)
 Site:* : 371Az68T : MOD_RES: Phosphothreonine (By similarity).
 Region: 369-398:66-95 : score=20; bits=13.7; Id=0.200; Q=0.0 :  Nebulin_35r-motif
 Site:* : 407Az104T : MOD_RES: Phosphothreonine (By similarity).
 Site:* : 416P<118S : MOD_RES: Phosphoserine (By similarity).
 Region: 400-434:97-131 : score=-8; bits=8.7; Id=0.150; Q=0.0 :  Nebulin_35r-motif
 Region: 435-499:132-200 : score=3; bits=8.1; Id=0.233; Q=0.0 :  NODOM :0
 Region: 499-547:201-258 : score=124; bits=47.7; Id=0.474; Q=92.2 :  SH3
 s-w opt: 142  Z-score: 242.8  bits: 53.6 E(455146): 4.6e-06
Smith-Waterman score: 149; 27.7% identity (55.0% similar) in 202 aa overlap (369-547:66-258)
Entrez Lookup  Re-search database  General re-search

[alignment]
      330       340       350       360       370       380       390       400        
sp|Q1  QVSSAYQKTVPVEAVTSKTSNIRANFENLAKEKEQEDRRKAEAERAQRMAKERQEQEEARRKLEEQARAKTQTPPVSPAP
                                               :.. .  :. .. . :...: : : . .       :. .:
sp|Q3B HKACFHCETCKMTLNMKNYKGYEKKPYCNAHYPKQSFTMVADTPENLRLKQQSELQSQVRYkeefeknkgkgfSVVADTP
          30        40        50        60       *70        80        90    ] [100   * 

           410       420       430       440        450       460         470       480
sp|Q1  Q-----PTEERLPSSPVYEDAASFKAELSYRGPVSGTEPE-PVYSMEAADYREASSQQGLAY--ATEAVYESAEAPGHYP
       .      :.... .   .:.      : :  :: .:   :    . . ..::. ..::   .  :.  ::..   : . :
sp|Q3B ELQRIKKTQDQISNIKYHEE-----FEKSRMGPSGGEGLECERRDPQESSYRRPQEQQQPHHIPASTPVYqq---pqqqp
         110       *20            130][     140       150       160       170          

              490                    500       510       520       530         540     
sp|Q1  AEDSTYDEYENDLGITAV-------------ALYDYQAAGDDEISFDPDDIITNIEMIDDGWWRGVCK--GRYGLFPANY
       : .: :  :..  . ...             :.:::.:: .::.::.  : :.:...:::::  :. .  :  :..::::
sp|Q3B aaqS-YGGYKEPAAPASIQRSAPGGGGKRYRAVYDYSAADEDEVSFQDGDTIVNVQQIDDGWMYGTVERTGDTGMLPANY
       180        190       20][      210       220       230       240       250      

         550
sp|Q1  VELRQ
       ::   
sp|Q3B VEAI 
        260

Sub-alignment scores, informed by -V annotation_scripts, can also be used to look for alignment of structural domains and even exon's in DNA alignments.

fasta-36.3.6 and sub-alignment scoring has been in development for the past 6 months, and some of the potential of the approach can be found at the FASTA web site:  http://fasta.bioch.virginia.edu/fasta_www2/.

Please let me know of problems.

Bill Pearson

fasta-36.3.6.tar.gz released, William Pearson, 07/05/2013

List archive

fasta-36.3.6.tar.gz released