$ deltablast -query rhodopsin_proteins.fasta
-subject four_human_proteins.fasta -evalue 1e-08 -outfmt "6 qseqid
sseqid score" -rpsdb /data/blastdb/cdd_delta
BLAST engine error: /data/blastdb/cdd_delta contains no frequency ratios needed for composition-based statistics.
Please disable composition-based statistics when searching against /data/blastdb/ncbi/cdd/cdd_delta.
BLAST engine error: /data/blastdb/cdd_delta contains no frequency ratios needed for composition-based statistics.
Please disable composition-based statistics when searching against /data/blastdb/ncbi/cdd/cdd_delta.
To cut a long story short, to fix this you need to download and unpack a newer cdd_delta.tar.gz which now includes another file cdd_delta.freq containing frequency ratio information which the newer deltablast tool requires.
The same applies to the rpsblast tool, although here you just get a warning rather than an error:
$ rpsblast -query four_human_proteins.fasta -db /data/blastdb/cdd_delta -evalue 1e-08 -outfmt "6 qseqid sseqid score"
Warning: /data/blastdb/cdd_delta contain(s) no freq ratios needed for composition-based statistics.
RPSBLAST will be run without composition-based statistics.
sp|Q9BS26|ERP44_HUMAN gnl|CDD|222416 401
...
sp|P06213|INSR_HUMAN gnl|CDD|238021 137
sp|P08100|OPSD_HUMAN gnl|CDD|215646 411
Warning: /data/blastdb/cdd_delta contain(s) no freq ratios needed for composition-based statistics.
RPSBLAST will be run without composition-based statistics.
sp|Q9BS26|ERP44_HUMAN gnl|CDD|222416 401
...
sp|P06213|INSR_HUMAN gnl|CDD|238021 137
sp|P08100|OPSD_HUMAN gnl|CDD|215646 411
For the full story, I am using two small sample files rhodopsin_proteins.fasta and four_human_proteins.fasta as test cases. Using BLAST+ 2.2.26 through 2.2.29, this example worked:
$ ~/ncbi_blast_2.2.29+/deltablast -query rhodopsin_proteins.fasta -subject four_human_proteins.fasta -evalue 1e-08 -outfmt "6 qseqid sseqid score" -rpsdb /data/blastdb/cdd_delta
gi|57163783|ref|NP_001009242.1| sp|P08100|OPSD_HUMAN 826
gi|3024260|sp|P56514.1|OPSD_BUFBU sp|P08100|OPSD_HUMAN 767
gi|283855846|gb|ADB45242.1| sp|P08100|OPSD_HUMAN 718
gi|283855823|gb|ADB45229.1| sp|P08100|OPSD_HUMAN 721
gi|223523|prf||0811197A sp|P08100|OPSD_HUMAN 842
gi|12583665|dbj|BAB21486.1| sp|P08100|OPSD_HUMAN 795
gi|57163783|ref|NP_001009242.1| sp|P08100|OPSD_HUMAN 826
gi|3024260|sp|P56514.1|OPSD_BUFBU sp|P08100|OPSD_HUMAN 767
gi|283855846|gb|ADB45242.1| sp|P08100|OPSD_HUMAN 718
gi|283855823|gb|ADB45229.1| sp|P08100|OPSD_HUMAN 721
gi|223523|prf||0811197A sp|P08100|OPSD_HUMAN 842
gi|12583665|dbj|BAB21486.1| sp|P08100|OPSD_HUMAN 795
The error message from BLAST+ 2.2.30 was a bit cryptic, but suggested the domain database format had changed. I was using quite an old copy of the cdd_delta database from November 2013, so I downloaded the current version of cdd_delta.tar.gz (dated 24 Oct 2014, verified MD5 checksum 0a5513e147aa320264a1414f8194cfbc as per cdd_delta.tar.gz.md5).
Now deltablast from BLAST+ 2.2.30 works, although the bit scores (and other details of the alignments) are slightly different.
$ ~/ncbi_blast_2.2.30+/2.2.30+/deltablast -query rhodopsin_proteins.fasta -subject four_human_proteins.fasta -evalue 1e-08 -outfmt "6 qseqid sseqid score" -rpsdb cdd_delta
gi|57163783|ref|NP_001009242.1| sp|P08100|OPSD_HUMAN 822
gi|3024260|sp|P56514.1|OPSD_BUFBU sp|P08100|OPSD_HUMAN 759
gi|283855846|gb|ADB45242.1| sp|P08100|OPSD_HUMAN 714
gi|283855823|gb|ADB45229.1| sp|P08100|OPSD_HUMAN 718
gi|223523|prf||0811197A sp|P08100|OPSD_HUMAN 839
gi|12583665|dbj|BAB21486.1| sp|P08100|OPSD_HUMAN 790
gi|57163783|ref|NP_001009242.1| sp|P08100|OPSD_HUMAN 822
gi|3024260|sp|P56514.1|OPSD_BUFBU sp|P08100|OPSD_HUMAN 759
gi|283855846|gb|ADB45242.1| sp|P08100|OPSD_HUMAN 714
gi|283855823|gb|ADB45229.1| sp|P08100|OPSD_HUMAN 718
gi|223523|prf||0811197A sp|P08100|OPSD_HUMAN 839
gi|12583665|dbj|BAB21486.1| sp|P08100|OPSD_HUMAN 790
So what changed? The new database contained an extra file, cdd_delta.freq - so for anyone else stumped by the error message "BLASTDB contains no frequency ratios needed for composition-based statistics" you need to check if there is a file named BLASTDB.freq present.
No comments:
Post a Comment