2015-02-02

BLAST+ rejecting query files with zero sequences

This is another brief NCBI BLAST+ bug report blog post, about a regression in BLAST+ 2.2.29 which will be breaking existing pipelines around the world. The problem is a new "feature" which treats an empty query file as an error.

For this example, first make an empty query file:

$ touch empty_file.fasta

Here's a simple example command showing older versions of BLAST+ would handle this corner case nicely, finishing with a zero return code (meaning success - shown here using echo and the special question mark environment variable). I tried this with BLAST+ 2.2.18 though to 2.2.28 inclusive:

$ blastp -query empty_file.fasta -db nr -outfmt 6; echo "[Return code $?]"
[Return code 0]

But not any more, both BLAST+ 2.2.29 and the current release 2.2.30 have broken this:

$ blastp -query empty_file.fasta -db nr -outfmt 6; echo "[Return code $?]"
Command line argument error: Query is Empty!
[Return code 1]

Following Unix conventions for an error, here the message is printed to stderr, and a non zero return code is used (one). I just don't agree that this is an error.

I accept that an empty input query file is unusual, but it does happen legitimately - particularly in automated pipelines. For instance, I have written Galaxy workflows which do things like start from a protein set, filter based on the presence of a signal peptide, then run BLAST against some known false-positives, which are then removed. This pipeline might very reasonably return zero sequences - and I want BLAST to accept this and carry on.

This bug was actually reported to me by Jim Johnson (see his issue report here), suggesting we add a work around in the Galaxy BLAST+ wrappers. The group at the University of Minnesota Supercomputing Institute has a pipeline which chunked large sequence sets by length before running BLAST. Occasionally one of the size bins could be empty, at which point BLAST+ broke their workflow.

My suggestion is for the NCBI to either remove this check, or simply downgrade it to a warning on stderr - with the critical requirement that it should revert to a zero return code. e.g.

$ blastp -query empty_file.fasta -db nr -outfmt 6; echo "[Return code $?]"
Warning: Command line argument error?: Query is Empty!
[Return code 0]

This gives some useful feedback for the user (especially if running BLAST+ by hand at the command line), without breaking legitimate use cases.

Since NCBI BLAST+ don't have a public bug tracker, I am blogging this here, and have reported the problem by email as well.

No comments:

Post a Comment