Subject: FASTA program discussion list
List archive
- From: William Pearson <>
- To:
- Subject: fasta-36.3.5 available
- Date: Tue, 31 May 2011 14:52:18 -0400
Version 36.3.5 of the FASTA package is available from:
http://faculty.virginia.edu/wrpearson/fasta/fasta36/fasta-36.3.5.tar.gz
fasta-36.3.5 includes a number of significant improvements to previous
versions of the package:
(1) Multiple output files -- it is now possible to write out different output
formats to different files using the -m "F9c m9c.out" -m "F10 m10.out"
option. This is particularly useful for lalign36, which can now display both
alignments and the .lav output used to show the alignment graphic (-m "F11
align.lav"). If the "F" is not used, then the -m alignment format applies to
the standard output.
(2) Searches against expanded libraries. Using the "-e expand.sh" script
option, it is possible expand the output of a search to include sequences
that were not part of the original search. For example, in metagenomics
searches, it is much more efficient to search a representative set of
bacterial proteomes, and then show alignments based on the sequences those
sequences can find in a comprehensive database. When the "-e expand.sh"
option is used, the FASTA programs write a set of high scoring sequence
accessions and E()-values out to a temporary file, e.g. expand_in.tmp, and
then the run the "expand.sh" script against that file ("expand.sh
expand_in.tmp > expand_out.tmp"). "expand.sh" is expected to examine the
accessions in expand_in.tmp and, using those sequences, produce additional
sequences (in FASTA format by default), that will be aligned, and if their
scores are significant, added to the set of alignments that are displayed in
the output. The "-e expand.sh" strategy is very flexible; all FASTA formats
(and indirect files) are supported.
(3) Efficient memory-based searches. Recent versions of FASTA36 have
included two alternative "main()" files, comp_lib5.c and comp_lib7.c
(selected in Makefile36m.common). comp_lib5.c read the database a sequence
at a time, and re-read the database for each query sequence in a multi-query
search. comp_lib7.c read the entire database into memory, which is far more
efficient for multi-query searches, but could not limit the amount memory
used to store the library. fasta-36.3.5 merges those two strategies into
comp_lib8.c. On 32-bit systems 2GB can be used to store the database in
memory; on 64-bit systems, 8GB is available. This value can be increased
arbitrarily using the LIB_MEMK=32G environment variable or the -XM32G command
line option (memory size is set in megabytes unless 'G' is specified).
(4) A new output option '-b >#' is provided, which guarantees that at least #
high scores are displayed; more may be displayed if they meet the
significance threshold. -b >1 can be used to always have the best score
reported.
(5) The -R search.res option now appends the parameter information necessary
to recalculate E()-value for every score.
(6) Some rarely used command line options have become "-X" extended options.
In addition, bugs with the "-V option" and underflows in statistical
estimates have been fixed.
As always, please let me know about problems.
Bill Pearson
- fasta-36.3.5 available, William Pearson, 05/31/2011
Archive powered by MHonArc 2.6.16.