I recently stumbled on a problem in NCBI Entrez with the GenBank (with parts) return type. Some GenBank files don't actually contain a sequence at the end - instead they have a CONTIG section telling you how to construct the sequence from other referenced pieces. That's often inconvenient so the NCBI have the handy option of downloading it with all this parts pre-computed, which normally is great.
Bioinformatics lessons learned the hard way, bugs, gripes, and maybe topical paper reviews too...
2012-03-16
2012-03-09
BAM versus CRAM v0.7
CRAM 0.7 was released earlier this month, and includes support for storing arbitrary read tags - a key requirement for it to be evaluated in existing pipelines as a BAM alternative. However, it doesn't preserve read names - which is a compression trick you can also do with plain BAM.
2012-02-14
Reference based SAM/BAM compression
In some respects the SAM/BAM specification is quite loose, in that there is more than one way to represent a given piece of information. We can take advantage of this to reduce the size on disk of mapped reads which match the reference sequence, while still maintaining conformance within the spec. I've written a SAM/BAM reference based compression script in Python - put this in your pipeline and smoke it!
2012-01-23
Ion Torrent Suite on GitHub
Good news - the Ion Torrent Suite is now freely available open source software on GitHub under the GPL v2 licence, as promised late last year. There is now something more substantial behind talk of Ion Torrent "democratising sequencing", and a clear advantage over the closed source tools of rival companies. I commend them!
2012-01-16
Ion Torrent does the Samba
I'm a bit behind the curve here (see Lex's blog post from July 2011), but I was amused to find out Ion Torrent call their current nucleotide flow order
I wonder if the flow order next revision will also get a dance based name? I'd suggest conga, since it is about synchronising lots of people.
TACGTACGTCTGAGCATCGATCGATGTACAGC
the "Samba". Apparently the idea is to avoid reads going out of phase which could happen with the traditional repeated flow TACG
(still used by Roche 454), by giving the molecules which missed a base a chance to catch up, and for IonTorrent this works better.I wonder if the flow order next revision will also get a dance based name? I'd suggest conga, since it is about synchronising lots of people.
Subscribe to:
Posts (Atom)