tag:blogger.com,1999:blog-8584629468471803075.comments2024-01-31T12:30:28.282+00:00Blasted Bioinformatics!?Peter Cockhttp://www.blogger.com/profile/00233221181317137855noreply@blogger.comBlogger144125tag:blogger.com,1999:blog-8584629468471803075.post-53610340444923447062021-11-20T07:04:10.476+00:002021-11-20T07:04:10.476+00:00The -parse_deflines still works to fix the query_#...The -parse_deflines still works to fix the query_# issueAnonymoushttps://www.blogger.com/profile/13550194764383650405noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-32032745263473709142020-08-27T02:14:34.044+01:002020-08-27T02:14:34.044+01:00Thanks Tomer, post what you hear.Thanks Tomer, post what you hear.Unknownhttps://www.blogger.com/profile/05057242080085760650noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-67167369263367282662020-07-15T21:17:59.133+01:002020-07-15T21:17:59.133+01:00I just submitted a bug report to NCBI regarding th...I just submitted a bug report to NCBI regarding the "Query_#" issue, which still exists...Tomer Altmanhttps://www.blogger.com/profile/11399762482947009194noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-91881602850863937552020-04-11T14:44:25.654+01:002020-04-11T14:44:25.654+01:00Hey guys, any idea why blastx load all files (db a...Hey guys, any idea why blastx load all files (db and query) in RAM memory, run threads number specified for some seconds, flush memory RAM, reload all files again, run threads again for some seconds, reflush... repeating a circle? This spends a long unnecessary time... How to solve this problem?rthttps://www.blogger.com/profile/17638980049061919818noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-35804626057932262002020-01-11T20:33:11.081+00:002020-01-11T20:33:11.081+00:00Enjoyed the post :) I agree that Shah et al. shoul...Enjoyed the post :) I agree that Shah et al. should have sent a nice bug report to the developers.lorforlinuxhttps://www.blogger.com/profile/12899116674835189667noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-17863511241168235122019-06-15T20:03:23.187+01:002019-06-15T20:03:23.187+01:00I had the same need when I worked on datetime rang...I had the same need when I worked on datetime ranges extractor from compressed bz2 log files in my project https://github.com/eugenyuk/extract_time_blk_bz2. Using mentioned https://bitbucket.org/james_taylor/seek-bzip2 project I've managed to implement random access extraction from bz2 log files.Eugenehttps://www.blogger.com/profile/15142567540624576127noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-80230269026868874462019-05-16T21:43:54.544+01:002019-05-16T21:43:54.544+01:00Hi,
Thank you for this post!
So, if max_target_s...Hi,<br /><br />Thank you for this post!<br /><br />So, if max_target_seqs doesn't give you the best hit, how can I do it to get the best hit? <br /><br />Thank you!<br /><br />Yslandhttps://www.blogger.com/profile/17896607992761820509noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-67482678351119352342019-01-08T11:38:57.613+00:002019-01-08T11:38:57.613+00:00The NCBI BLAST team's formal reply letter was ...The NCBI BLAST team's formal reply letter was published in OUP Bioinformatics at the end of December, and I talk about this and another issue they uncovered and fixed in BLAST+ 2.8.1 while investigating the Shah et al. (2018) test case in my latest blog post, "An overly aggressive optimization in BLASTN and MegaBLAST" https://blastedbio.blogspot.com/2019/01/blast-overly-aggressive-optimization.htmlPeter Cockhttps://www.blogger.com/profile/00233221181317137855noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-41903939870198719682019-01-08T10:51:21.996+00:002019-01-08T10:51:21.996+00:00I hope today's post answers this very sensible...I hope today's post answers this very sensible question about the composition of the internal candidate list - titled "An overly aggressive optimization in BLASTN and MegaBLAST", this follows the BLAST author's formal reply and the release of BLAST+ 2.8.1 https://blastedbio.blogspot.com/2019/01/blast-overly-aggressive-optimization.htmlPeter Cockhttps://www.blogger.com/profile/00233221181317137855noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-42596942119155127952018-12-06T12:48:58.718+00:002018-12-06T12:48:58.718+00:00All of the discussion about the SIZE of the prelim...All of the discussion about the SIZE of the prelim_hitlist is fine, but this is only a part of the problem. The composition of the prelim_hitlist is also a problem. Regardless of its size, if the prelim_hitlist actually contained the "best" targets, there would be no problem. Hence, the algorithm for constructing the prelim_hitlist is crucial for understanding the actual scope of the problems associated with max_target_seqs. Hopefully someone will investigate this.Anonymoushttps://www.blogger.com/profile/18349097306768692637noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-21187479138245457032018-12-04T10:15:15.976+00:002018-12-04T10:15:15.976+00:00I can only really point you at the Appendix entry...I can only really point you at the Appendix entry "Outline of the BLAST process", or if you really want to explore this yourself, the source code. However, saying"first ones" is misleading as the order in the database only comes into play as a tie break (see discussion in blog post part three).Peter Cockhttps://www.blogger.com/profile/00233221181317137855noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-14582797552497427232018-12-04T03:10:08.888+00:002018-12-04T03:10:08.888+00:00Thank you for this excellent elucidation of the de...Thank you for this excellent elucidation of the determination of prelim_hitlist_size in the various BLAST programs. Using your analogy about a half marathon, what criteria do the BLAST programs use to determine which sequences are in the lead at the halfway point (I am particularly concerned with BLASTn)? That is, how do sequences make it onto the prelim_hitlist? Is it really as simple the first ones that meet the criteria? Anonymoushttps://www.blogger.com/profile/18349097306768692637noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-31322395453166869042018-11-15T16:01:25.370+00:002018-11-15T16:01:25.370+00:00Hi Mick. Part Two ought to answer that, but in ess...Hi Mick. Part Two ought to answer that, but in essence when the initial cull is too strict, the two best bacteria get drowned out by initially more promising eukaryota hits. It would be possible to go into this example in more depth, e.g. inserting debugging into the BLAST+ codebase to dump out the intermediate candidate list, but there are other things I'd like to explore more.Peter Cockhttps://www.blogger.com/profile/00233221181317137855noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-73474436946039925892018-11-15T14:57:19.549+00:002018-11-15T14:57:19.549+00:00So why are the two Bacterial hits eliminated? Do ...So why are the two Bacterial hits eliminated? Do we know?BioMickWatsonhttps://www.blogger.com/profile/08907442705975203661noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-36021816792159214792018-11-15T14:45:52.500+00:002018-11-15T14:45:52.500+00:00Thank you for your feedback. Sujai Kumar and I hav...Thank you for your feedback. Sujai Kumar and I have discussed writing a formal reply letter, and have been in touch with the NCBI BLAST team about this. There are still a few more points I would like to explore as blog posts, so in the mean time, please subscribe to the RSS/Atom feed (or follow me on Twitter).Peter Cockhttps://www.blogger.com/profile/00233221181317137855noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-30057453181237070382018-11-15T09:51:23.484+00:002018-11-15T09:51:23.484+00:00Do you plan to write an article for Oxford Bioinfo...Do you plan to write an article for Oxford Bioinformatics, proving that the problem with BLAST in not nearly as big as Shah and co-authors stated? I think this article will be very valuable. Many bioinformaticians almost panic because they don't realize that the article of Shah is wrong https://oxfordjournals.altmetric.com/details/48829434/twitterMikhail Schelkunovhttps://www.blogger.com/profile/16333482964177725750noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-87277692363817276702018-11-10T16:18:41.331+00:002018-11-10T16:18:41.331+00:00Dear peter, Just wanted to briefly give my thanks ...Dear peter, Just wanted to briefly give my thanks for these detailed comments, thoughts and analyses about the BLAST issues. <br />Cheers!<br />Yannickyannickwurmhttps://www.blogger.com/profile/12773810355997893057noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-85109302733643177632018-11-04T20:00:06.402+00:002018-11-04T20:00:06.402+00:00Deep apologies about misspelling your name - I'...Deep apologies about misspelling your name - I've been actively fighting autocorrect, but have clearly internalised the wrong spelling. I'll fix that, and hope to look at your test case soon.Peter Cockhttps://www.blogger.com/profile/00233221181317137855noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-81194572758097520762018-11-04T19:39:43.635+00:002018-11-04T19:39:43.635+00:00Hi Peter,
We have provided some test examples her...Hi Peter,<br /><br />We have provided some test examples here - https://github.com/shahnidhi/BLAST_maxtargetseq_analysis. Please check it out. Also, my last name is 'Shah', which is misspelled here at multiple places. I'd appreciate if you change it. :) <br />- NidhiAnonymoushttps://www.blogger.com/profile/14346529926163676908noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-24253158209371116852018-11-03T22:13:05.240+00:002018-11-03T22:13:05.240+00:00Already mentioned this on Twitter, but it's fa...Already mentioned this on Twitter, but it's fairly trivial to test the claim that "...BLAST returns the first N hits that exceed the specified E-value threshold..."<br /><br />To test this, one can effectively remove the E-value threshold by setting it to some extreme value such that no hits are discarded. In that case, if this claim were true, every search should return the first N entries in the database.Peterhttps://www.blogger.com/profile/12559721137290332762noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-41217752923521216302018-11-03T16:36:24.558+00:002018-11-03T16:36:24.558+00:00Fixed, thank you!Fixed, thank you!Peter Cockhttps://www.blogger.com/profile/00233221181317137855noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-45383009855088613042018-11-03T13:48:19.541+00:002018-11-03T13:48:19.541+00:00The link "BLAST max alignment limits repartee...The link "BLAST max alignment limits repartee - part two" leads to part oneMikhail Schelkunovhttps://www.blogger.com/profile/16333482964177725750noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-6983371048382676732018-11-02T16:56:09.461+00:002018-11-02T16:56:09.461+00:00And part two which https://blastedbio.blogspot.com...And part two which https://blastedbio.blogspot.com/2018/11/blast-max-alignment-limits-repartee-two.html is up, asking if database order is important (as claimed in Shar et al. 2018), and how exactly the internal alignment number limit works.Peter Cockhttps://www.blogger.com/profile/00233221181317137855noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-65840182242247268532018-11-02T10:11:21.665+00:002018-11-02T10:11:21.665+00:00The next post is up at https://blastedbio.blogspot...The next post is up at https://blastedbio.blogspot.com/2015/12/blast-max-target-sequences-bug.html along side a self-contained small test case at https://github.com/peterjc/blast_max_target_seqsPeter Cockhttps://www.blogger.com/profile/00233221181317137855noreply@blogger.comtag:blogger.com,1999:blog-8584629468471803075.post-40962994953561901822018-11-01T16:33:58.403+00:002018-11-01T16:33:58.403+00:00Sorry, to avoid creating more confusion - I should...Sorry, to avoid creating more confusion - I should scrub my comment about the age of the current Blast release...seems I forgot what year it is! It's just over a year old.John Walshawhttps://www.blogger.com/profile/01917553075780793874noreply@blogger.com