2025-11-24

Thoughts on generative AI contributions as an Open Source project maintainer

To be clear up front, these are my personal opinions right now, and while they will influence any future policy I have a hand in writing, they are not the position of the Biopython Project or the Open Bioinformatics Foundation where I hold a leadership role.

So, it took longer than I expected, but we've had our first Biopython pull requests or proposals openly using generative AI. On the bright side, they are not trivial drive by slop in some perversely incentivized scheme to get a T-shirt or pad a CV. But I still have qualms, lots of them!

2025-10-01

The case of the missing Editorial Blog Post (and journal team)

Here’s a wee puzzle: A mature Open Data focused journal (“Journal A”), owned and launched by an company or Institute (“Institute B”), developed into the flagship of an Academic Publisher (“Publisher C”), runs their own properly archived and citable blog with DOIs etc (“Blog D”). If a briefly published editorial Blog Post (“Editorial E”) disappears from their Blog, could it be an accident, or something else?

An accidental deletion of a blog post by a publisher suggests might be taken as incompetence, but if the post isn’t returned it starts to look deliberate to me. Normally I would have expected a formal retraction, but this feels like self-censorship in an effort to control public perception? Neither is a good look for a custodian of the Scientific Record.

Can you tell what I’m talking about yet? What if I hint the missing “Editorial E” blog post was by the out-going Editor In Chief of “Journal A”, celebrating many years of innovation, and noting the decision of “Institute B” to terminate the jobs of a large part of the team at “Publisher C”, and that this post on “Blog D” is/was the only announcement of those changes? Answer below…

2025-05-20

What have you done to your keyboard?! Hands Down Promethium on a MacBook

There were a few stickers on the lid, but why does my laptop look like this now? In short, I'm learning to touchtype a non-qwerty layout.

Photo of my Japanese MacBook keyboard with 34 stickers added for my variant of the inverted Hands Down Promethium layout
My Japanese MacBook keyboard has 34 stickers for a custom layout

2024-06-07

FASTQ uploads to ENA FTP site with rclone

I've recently been working what I considered to be a large scale FASTQ upload to the European Nucleotide Archive (ENA), from where it will be mirrored to the NCBS Short or Sequence Read Archive (SRA). Although the total size was "only" 37GB, this was about 3500 pairs of Illumina MiSeq FASTQ files - more than enough to make me worry about the job being interrupted and needing to resume without repeating uploads.

2024-02-02

BLAST max-target-seq meets metabarcoding

This is my first blog post in years - primarily down to a second child who is now a toddler. And what better topic to return to than a mainstay of past content, NCBI BLAST? This time with a motivating example from my recent work, metabarcoding. This is term used for sequencing a diagnostic region of DNA using specific primers for a group of organisms of interest, and then matching that amplicon to a database of known species. Human interpretation of a BLAST search can generally put a good guess as the organism - weighing hits and annotated taxonomy (e.g. ignoring the odd suspicious uncultured "fungal" match).

This post is about how sometimes BLAST on the NCBI website can miss 100% identical (albeit not full length) matches, returning instead lots of very good but longer matches. Basically the online defaults don't suit this use-case.