Subject: FASTA program discussion list
List archive
- From: William Pearson <>
- To:
- Subject: fasta-36.3.6.tar.gz released
- Date: Fri, 5 Jul 2013 16:41:47 -0400
A new release of the FASTA package, fasta-36.3.6.tar.gz, is available from: This version fixes various bugs in earlier versions, but, more importantly, it introduces a powerful new strategy for incorporating annotation/feature information in alignments. (1) It expands the old "-V" option to allow scripts to produce annotations about both the query and library sequences. Thus, ssearch36 -V '\!../scripts/ann_feats_up_www2.pl' will cause ssearch36 to run the 'scripts/ann_feats_up_www2.pl' perl script to acquire feature (active sites, binding sites), variation, and domain information from the Uniprot DAS/GFF server. This information is then included in the alignment, and can be used to re-score the alignment if a variant residue improves the score, and display the state of active site residues, thus: >>sp|P09488.3|GSTM1_HUMAN Glutathione S-transf (218 aa) Site:* : 23Y=23Y : MOD_RES: Phosphotyrosine (By similarity). Site:* : 33Y=33Y : MOD_RES: Phosphotyrosine (By similarity). Site:* : 34T=34T : MOD_RES: Phosphothreonine (By similarity). Region: 1-88:1-88 : score=613; bits=155.9; Id=1.000; Q=422.6 : Glutathione_S-Trfase_N :1 Site:# : 116Y=116Y : BINDING: Substrate. Variant: 173N=173N : K173N : in allele GSTM1B; dbSNP:rs1065411. Region: 90-208:90-208 : score=809; bits=203.8; Id=1.000; Q=566.8 : Glutathione_S_Trfase/Cl_chnl_C :2 Variant: 210T=210T : S210T : in dbSNP:rs449856. s-w opt: 1500 Z-score: 1975.7 bits: 372.7 E(455146): 1.4e-102 Smith-Waterman score: 1500; 100.0% identity (100.0% similar) in 218 aa overlap (1-218:1-218) 10 20 30 40 50 60 sp|P0 MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNL :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: sp|P09 MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNL [ 10 20 * 30 ** 40 50 60 70 80 90 100 110 120 sp|P0 PYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEF :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: sp|P09 PYLIDGAHKITQSNAILCYIARKHNLCGETEEEKIRVDILENQTMDNHMQLGMICYNPEF 70 80 ]9[ 100 110 # 120 130 140 150 160 170 180 sp|P0 EKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPNCLDAFPN :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: sp|P09 EKLKPKYLEELPEKLKLYSEFLGKRPWFAGNKITFVDFLVYDVLDLHRIFEPNCLDAFPN 130 140 150 160 170 V 180 190 200 210 sp|P0 LKDFISRFEGLEKISAYMKSSRFLPRPVFTKMAVWGNK :::::::::::::::::::::::::::::::::::::: sp|P09 LKDFISRFEGLEKISAYMKSSRFLPRPVFTKMAVWGNK 190 200 ]1V The -V 'q\!ann_feats_up_www2.pl' option uses the script to annotate the Query sequence (it must have a UniProt accession for feature information to be found). The ../scripts/ann_feats_up_www2.pl included with the distribution provides a simple implementation of the annotation script strategy, but is dramatically slows down alignment display, because it uses an external website for feature information. The "scripts" directory also provides scripts that use local mysql databases to get annotation information, which is display much more quickly. (2) Annotation scripts that provide domain information can be used to produce "sub-alignment" scores, in which the alignment is partitioned and scores for each of the partitions are calculated. In the example below, region 1-88 is a GST N-terminal domain, and 90-208 is a GST C-terminal domain. Associated with each of those domains is a raw sub-alignment score, the associated bit score, percent identity, and the Q-score, with is -10 * log(p-value) for the associated bit score. In this case, since the sequences are 100% identical, the sub-alignment scores are not very interesting. But sometimes alignments are seeded by homologous domains, but extend considerably beyond the homologous region. For example, in this alignment between SRC8_HUMAN (cortactin) and LASP1_BOVIN, several domains align, but only the C-terminal SH3 domain alignment contributes significant score: >>sp|Q3B7M5.1|LASP1_BOVIN LIM and SH3 domain protein 1; LASP-1 (260 aa) Site:* : 371Az68T : MOD_RES: Phosphothreonine (By similarity). Region: 369-398:66-95 : score=20; bits=13.7; Id=0.200; Q=0.0 : Nebulin_35r-motif Site:* : 407Az104T : MOD_RES: Phosphothreonine (By similarity). Site:* : 416P<118S : MOD_RES: Phosphoserine (By similarity). Region: 400-434:97-131 : score=-8; bits=8.7; Id=0.150; Q=0.0 : Nebulin_35r-motif Region: 435-499:132-200 : score=3; bits=8.1; Id=0.233; Q=0.0 : NODOM :0 Region: 499-547:201-258 : score=124; bits=47.7; Id=0.474; Q=92.2 : SH3 s-w opt: 142 Z-score: 242.8 bits: 53.6 E(455146): 4.6e-06 Smith-Waterman score: 149; 27.7% identity (55.0% similar) in 202 aa overlap (369-547:66-258) Entrez Lookup Re-search database General re-search [alignment] Sub-alignment scores, informed by -V annotation_scripts, can also be used to look for alignment of structural domains and even exon's in DNA alignments. fasta-36.3.6 and sub-alignment scoring has been in development for the past 6 months, and some of the potential of the approach can be found at the FASTA web site: http://fasta.bioch.virginia.edu/fasta_www2/. Please let me know of problems. Bill Pearson |
- fasta-36.3.6.tar.gz released, William Pearson, 07/05/2013
Archive powered by MHonArc 2.6.16.