This is my first blog post in years - primarily down to a second child who is now a toddler. And what better topic to return to than a mainstay of past content, NCBI BLAST? This time with a motivating example from my recent work, metabarcoding. This is term used for sequencing a diagnostic region of DNA using specific primers for a group of organisms of interest, and then matching that amplicon to a database of known species. Human interpretation of a BLAST search can generally put a good guess as the organism - weighing hits and annotated taxonomy (e.g. ignoring the odd suspicious uncultured "fungal" match).
This post is about how sometimes BLAST on the NCBI website can miss 100% identical (albeit not full length) matches, returning instead lots of very good but longer matches. Basically the online defaults don't suit this use-case.