Back in 2009, I wrote some Python scripts to use the NCBI Entrez Utilities to search for and download all known complete virus genomes in GenBank format, which I then processed to make FASTA files and BLAST databases. Recently I updated them and ran into some problems... false positives like entire bacterial genomes! This turns out to be due to a few bacteria with integrated phage being annotated as chimeras - genomes combined from multiple organisms.
Bioinformatics lessons learned the hard way, bugs, gripes, and maybe topical paper reviews too...
2013-11-14
2013-10-20
UTF8 encoded Japanese in LaTeX
Slightly off topic, but anyway... notes on getting Japanese text working in LaTeX under Mac OS X using TeX Live. Once I finally got it to work it is quite easy, but first I explored a lot of dead ends and distractions (in the end I could ignore LaTeX Omega, XeLaTeX, etc). I'm just using pdflatex with the LaTeX Chinese, Japanese, Korean (CJK) package, here's an example from the PDF output:
![]() |
2013-09-20
Interview with a PeerJ author (me)
This summer I submitted a paper to the innovative new open access journal PeerJ, where it was published this week (Cock et al. 2013). I decided to write up the experience in the style of the PeerJ's Interview with an Author blog posts. I've copied the questions they normally ask, and written up my own replies - other than some rough edges in their current submission system it was all good.
Update: This has been reposted on the official PeerJ blog, with responses which I have inserted below.
Update: This has been reposted on the official PeerJ blog, with responses which I have inserted below.
Using Travis-CI for testing Galaxy Tools
Travis CI is one of the best things to happen to GitHub in some time - it adds automated testing capabilities to your source code repository as changes are committed, and even on pull requests to help ensure new work doesn't break existing functionality.
We've been using this for Biopython for over a year, but this month I've started using TravisCI for testing my add-ons for the Galaxy Project as well. My Galaxy tools (see also Cock et al. 2013) were already being tested every night once uploaded to the Galaxy Tool Shed, and I always stage releases via the Galaxy Test Tool Shed before posting them on the main Galaxy Tool Shed. However this fixed nightly schedule isn't very flexible for debugging failures.
I've currently got TravisCI working for my two Galaxy tool repositories on GitHub. Both configurations follow the same basic approach, which I have tried to explain in this post, and run the tests as soon as I update GitHub.
We've been using this for Biopython for over a year, but this month I've started using TravisCI for testing my add-ons for the Galaxy Project as well. My Galaxy tools (see also Cock et al. 2013) were already being tested every night once uploaded to the Galaxy Tool Shed, and I always stage releases via the Galaxy Test Tool Shed before posting them on the main Galaxy Tool Shed. However this fixed nightly schedule isn't very flexible for debugging failures.
Galaxy BLAST tools: | ![]() |
Galaxy sequence analysis tools: | ![]() |
I've currently got TravisCI working for my two Galaxy tool repositories on GitHub. Both configurations follow the same basic approach, which I have tried to explain in this post, and run the tests as soon as I update GitHub.
2013-08-09
Pixelated Posters at Potatoes in Practice
Yesterday I attended the annual "Potatoes in Practice" meeting for the first time, mainly to see the finished display which I helped produce. Here it is, showing the twelve chromosomes of potato, drawn as stylized uniform green 'X' shapes, with different colour LEDs marking traits of interest for potato breeding.
My contribution was the background images, which are actually drawn using the bases of the potato genome instead of pixels. For this I wrote a little Python script to render photos using A, C, G and T from a FASTA sequence file, using Biopython to load the sequences, the Python Imaging Library (PIL) to load the photos, NumPy to manipulate the array, and ReportLab to render a PDF.
![]() |
![]() |
Potato chromosomes 1 to 6, and 7 to 12, with LEDs marking traits of interest |
My contribution was the background images, which are actually drawn using the bases of the potato genome instead of pixels. For this I wrote a little Python script to render photos using A, C, G and T from a FASTA sequence file, using Biopython to load the sequences, the Python Imaging Library (PIL) to load the photos, NumPy to manipulate the array, and ReportLab to render a PDF.
![]() | ![]() |
Potato chromosomes 9 and 10 | Close-up showing the A, C, G, T pixels |
Subscribe to:
Posts (Atom)