Skip to Content.
Sympa Menu

fasta_list - Re: ssearch36 different results with same command

Subject: FASTA program discussion list

List archive

Re: ssearch36 different results with same command


Chronological Thread 
  • From: William Pearson <>
  • To: "olivier.rue" <>
  • Cc:
  • Subject: Re: ssearch36 different results with same command
  • Date: Fri, 30 Aug 2013 11:47:55 -0400

As far as I can tell, and from testing with the sequences and matrix you sent, the only difference in the results is the E()-value and bit score.  The raw score and alignment locations for the best alignment in  both cases is the same.  The second run shows two results, because the slight different in statistical parameter estimation allowed a raw score of 18 (as opposed to the optimal score of 19) to be significant enough to be displayed as secondary alignment.

In the FASTA programs, the statistical parameters lambda and K are estimated by shuffling the library/target sequence the number of times indicated (500) when only two sequences are compared.  This is why the E()-values are slightly different.

Looking at the values that were estimated in the two runs, the lambda values are quite similar (0.6170, 0.5685) but the K values differ almost by a factor of 3.  This suggests to me that the set of 500 shuffled scores is not providing a good estimate of lambda and K, and I suspect that this is happening because your gap penalties are much too low for the scoring matrix you are using, so the alignments are not local, and the statistical distribution that is being fit is wrong.

If you use more sensible gap penalties, -f -3 -g -1, the K values are much more stable.

If you are using low gap penalties to ensure a global match to the query, you might consider glsearch36, which would do this more directly, and seems to provide relatively stable statistics with both -1/-1 and -3/-1 gap penalties.

Bill Pearson

Begin forwarded message:

From: "olivier.rue" <>

Subject: ssearch36 different results with same command

Date: August 30, 2013 4:16:25 AM EDT

To:



Hello Mr Pearson,

I use ssearch36 and I am quite surprised to get different outputs when I run the same command line multiple times.
Here are the two outputs :

# /home/olivier/Installations/fasta-36.3.6/bin/ssearch36 -E 7 -m 1 -3 -f -1 -g -1 -w 200 -s hybridation_matrix.mat potential_mir_scaffold_10328_5401_5419_0.fa before_scaffold_10328_5401_5419_0.fa
SSEARCH performs a Smith-Waterman search
 version 36.3.6 June, 2013(preload9)
Please cite:
 T. F. Smith and M. S. Waterman, (1981) J. Mol. Biol. 147:195-197;
 W.R. Pearson (1991) Genomics 11:635-650

Parameters not available for: hybridation_matrix.mat: -1/-1
Query: potential_mir_scaffold_10328_5401_5419_0.fa
  1>>>scaffold_10328_5401_5419_0 - 19 nt (forward-only)
Library: before_scaffold_10328_5401_5419_0.fa
      200 residues in     1 sequences

Statistics: (shuffled [500]) MLE statistics: Lambda= 0.6170;  K= 17.62
 statistics sampled from 1 (1) to 500 sequences
Algorithm: Smith-Waterman (PGopt) (7.2 Nov 2010)
Parameters: hybridation_matrix.mat matrix (2:-1), open/ext: -1/-1
 Scan time:  0.020

The best scores are:                                                                                                                                                                    &nb sp;             s-w bits E(1)
before                                                                                                                                                                    & nbsp;                ( 200) [f]   19 12.8    0.42

>>before                                                                                                                                                                    ;                           (200 nt)
 s-w opt:  19  Z-score: 50.3  bits: 12.8 E(1): 0.42
Smith-Waterman score: 19;  0.0% identity (91.7% similar) in 12 nt overlap (7-18:137-148)

                                                                                                             10                                                           
scaff                                                                                                CGGTGGGAAACGGAACTAC                                                  
                                                                                                           xxxxxXxxxxxx                                                   
before CACCTAATACCTTTTTTGCACAATTGGTTAGAGCCTGTAAGTAAGCATTTCACTGTAAGGTCTACTACACATGTTGTTTTCGGCGCACGTGACATATAAACTTTGATTTGATTTGATAATCTCATGAAGCTGGTTGAGAGAATGCCAATAGTTTGCAAAGCTGT
         40        50        60        70        80        90       100       110       120       130       140       150       160       170       180       190       200



19 residues in 1 query   sequences
200 residues in 1 library sequences
 Tcomplib [36.3.6 June, 2013(preload9)] (2 proc in memory [0G])
 start: Fri Aug 30 08:56:20 2013 done: Fri Aug 30 08:56:20 2013
 Total Scan time:  0.020 Total Display time:  0.000

Function used was SSEARCH [36.3.6 June, 2013(preload9)]

and

# /home/olivier/Installations/fasta-36.3.6/bin/ssearch36 -E 7 -m 1 -3 -f -1 -g -1 -w 200 -s hybridation_matrix.mat potential_mir_scaffold_10328_5401_5419_0.fa before_scaffold_10328_5401_5419_0.fa
SSEARCH performs a Smith-Waterman search
 version 36.3.6 June, 2013(preload9)
Please cite:
 T. F. Smith and M. S. Waterman, (1981) J. Mol. Biol. 147:195-197;
 W.R. Pearson (1991) Genomics 11:635-650

Parameters not available for: hybridation_matrix.mat: -1/-1
Query: potential_mir_scaffold_10328_5401_5419_0.fa
  1>>>scaffold_10328_5401_5419_0 - 19 nt (forward-only)
Library: before_scaffold_10328_5401_5419_0.fa
      200 residues in     1 sequences

Statistics: (shuffled [500]) MLE statistics: Lambda= 0.5685;  K= 6.637
 statistics sampled from 1 (1) to 500 sequences
Algorithm: Smith-Waterman (PGopt) (7.2 Nov 2010)
Parameters: hybridation_matrix.mat matrix (2:-1), open/ext: -1/-1
 Scan time:  0.020

The best scores are:                                                                                                                                                                    &nb sp;             s-w bits E(1)
before                                                                                                                                                                    & nbsp;                ( 200) [f]   19 12.9     0.4

>>before                                                                                                                                                                    ;                           (200 nt)
 s-w opt:  19  Z-score: 50.7  bits: 12.9 E(1):  0.4
Smith-Waterman score: 19;  0.0% identity (91.7% similar) in 12 nt overlap (7-18:137-148)

                                                                                                             10                                                           
scaff                                                                                                CGGTGGGAAACGGAACTAC                                                  
                                                                                                           xxxxxXxxxxxx                                                   
before CACCTAATACCTTTTTTGCACAATTGGTTAGAGCCTGTAAGTAAGCATTTCACTGTAAGGTCTACTACACATGTTGTTTTCGGCGCACGTGACATATAAACTTTGATTTGATTTGATAATCTCATGAAGCTGGTTGAGAGAATGCCAATAGTTTGCAAAGCTGT
         40        50        60        70        80        90       100       110       120       130       140       150       160       170       180       190       200

>--
 s-w opt:  18  Z-score: 46.3  bits: 12.0 E(1):  0.6
Smith-Waterman score: 18;  0.0% identity (75.0% similar) in 20 nt overlap (4-18:45-64)

                                                        10                                                                                                                                               @
scaff                                           CGGTGGGAAA--CG-G--AACTAC                                                                                                                                      
                                                   xxxxxxx  xx x  xxxxx                                                                                                                                       
before TTTACTGTTGTTTTATTTCTTTACTTACCTATGGTTCACCTAATACCTTTTTTGCACAATTGGTTAGAGCCTGTAAGTAAGCATTTCACTGTAAGGTCTACTACACATGTTGTTTTCGGCGCACGTGACATATAAACTTTGATTTGATTTGATAATCTCATGAAGCTGGTTGAGAGAATGCCAATAGTTTGCAAAGCTGT
               10        20        30        40        50        60        70        80        90       100       110       120       130       140       150       160       170       180       190       200



19 residues in 1 query   sequences
200 residues in 1 library sequences
 Tcomplib [36.3.6 June, 2013(preload9)] (2 proc in memory [0G])
 start: Fri Aug 30 08:57:09 2013 done: Fri Aug 30 08:57:09 2013
 Total Scan time:  0.020 Total Display time:  0.000

Function used was SSEARCH [36.3.6 June, 2013(preload9)]


I attach the files if you want to run the commands by yourself.
I tried with the last updated version, but results are the same.
Do you have any explanation ?

Thanks for advance,

-- 
========================================================
 Olivier RUÉ
 INRA - MIAT - Plateforme Bioinfo-Genotoul
 Chemin de Borde-Rouge - Auzeville - CS 52627
 31326 Castanet-Tolosan cedex - FRANCE
 Tel: +33 (0)5.61.28.54.93
 http://www.bioinfo.genotoul.fr/ 


>before
TTTACTGTTGTTTTATTTCTTTACTTACCTATGGTTCACCTAATACCTTTTTTGCACAATTGGTTAGAGCCTGTAAGTAAGCATTTCACTGTAAGGTCTACTACACATGTTGTTTTCGGCGCACGTGACATATAAACTTTGATTTGATTTGATAATCTCATGAAGCTGGTTGAGAGAATGCCAATAGTTTGCAAAGCTGT
# RNA matrix favorising hybridation
    A  C  G  T  U  R  Y  M  W  S  K  D  H  V  B  N  X
 A -1 -1 -1  2  2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 C -1 -1  2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 G -1  2 -1  0  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 T  2 -1  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 U  2 -1  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 R -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 Y -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 M -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 W -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 S -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 K -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 D -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 H -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 V -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 B -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 N -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
>scaffold_10328_5401_5419_0
CGGTGGGAAACGGAACTAC



Archive powered by MHonArc 2.6.16.

Top of Page